You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wesley Small <We...@mtvstaff.com> on 2009/03/31 17:50:27 UTC

DIH; Hardcode field value/replacement based on source column

I am trying to find a clean way to *hardcode* a field/column to a specific
value during the DIH process.  It does seems to be possible but I am getting
an slightly invalid constant value in my index.

<field column="content_type_s" sourceColName="title_t" regex="(.*)"
replaceWith="Video" />

However, the value in the index was set to "VideoVideo" for all documents.

Any idea why this DIH instruction would see constant value appear twice??

Thanks,
Wesley.



Re: DIH; Hardcode field value/replacement based on source column

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
use TemplateTransformer
<field column="content_type_s" template="Video" />



On Tue, Mar 31, 2009 at 9:20 PM, Wesley Small <We...@mtvstaff.com> wrote:
> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process.  It does seems to be possible but I am getting
> an slightly invalid constant value in my index.
>
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
>
> However, the value in the index was set to "VideoVideo" for all documents.
>
> Any idea why this DIH instruction would see constant value appear twice??
>
> Thanks,
> Wesley.
>
>
>



-- 
--Noble Paul

Re: DIH; Hardcode field value/replacement based on source column

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Tue, Mar 31, 2009 at 9:20 PM, Wesley Small <We...@mtvstaff.com>wrote:

> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process.  It does seems to be possible but I am
> getting
> an slightly invalid constant value in my index.
>
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
>
> However, the value in the index was set to "VideoVideo" for all documents.
>
> Any idea why this DIH instruction would see constant value appear twice??
>

Not sure. Let me run a test and get back.

However, this is a job for super....ahem, TemplateTransformer!

http://wiki.apache.org/solr/DataImportHandlerFaq
-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH; Hardcode field value/replacement based on source column

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
iterestingly
 <field column="id" regex="^(.*)$" replaceWith="$1#${x.imgvurl}"
sourceColName="fileAbsolutePath"/>

seems to  work

On Wed, Apr 1, 2009 at 12:13 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:
> Hmmm, I am sure I have seen this as well!
>
>  <field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}" sourceColName="fileAbsolutePath"/>
>
> I get the #${x.imgvurl} added twice.
>
> Fergus.
>
>>On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:
>>
>>> I am trying to find a clean way to *hardcode* a field/column to a specific
>>> value during the DIH process.  It does seems to be possible but I am getting
>>> an slightly invalid constant value in my index.
>>>
>>> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
>>> replaceWith="Video" />
>>>
>>> However, the value in the index was set to "VideoVideo" for all documents.
>>>
>>> Any idea why this DIH instruction would see constant value appear twice??
>>>
>
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Re: DIH; Hardcode field value/replacement based on source column

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>: Indeed. I wrote the following test:
>: 
>: Pattern p = Pattern.compile("(.*)");
>: Matcher m = p.matcher("xyz");
>: Assert.assertEquals("", "Video", m.replaceAll("Video"));
>: 
>: The test fails. It gives "VideoVideo" as the actual result. I guess there is
>: something about Matcher.replaceAll that I don't know. Off to read the
>: javadocs then.
>
>".*" matches the empty string (for that matter any regex clause with the 
>"*" modifier applied matches the empty string), and iterating over pattern 
>matches (ie: what happens if you call Matcher.find() or 
>Matcher.replaceAll()) always advances to "first character not matched by 
>[the previous] match." (ie: let prev = m.end(); if (m.find) then prev <= 
>m.start()).
>
>So ".*" always matches twice on any given String x ... once when it 
>matches from 0 to x.length()-1, and one when it matches the empty string 
>starting and ending at x.length()-1.
>
>That's why using "^.*" doesn't have this problem ... "*" is greedy so it 
>only matches once at the start of the string and then there can't be any 
>more matches.  Conversly: ".*$" and ".*\z" will still have this problem, 
>because any number of matches can have the same ending offset.
>
>
>-Hoss

Hmmm, given the chance perl behaves the same. Although attempting
to use  /*/ fails. Another lesson learnt!

#! /usr/local/bin/perl
use strict;
my($s)="cat mat rat hat";
my($c)=0;

print " a-match", ++$c, "='$1'\n" while( $s =~ m/(at)/g ); 
$c=0;
print " b-match", ++$c, "='$1'\n" while( $s =~ m/(.*)/g );
$c=0;
print " c-match", ++$c, "='$1'\n" while( $s =~ m/^(.*)/g );
$c=0;
print " d-match", ++$c, "='$1'\n" while( $s =~ m/(.*)$/g );

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: DIH; Hardcode field value/replacement based on source column

Posted by Chris Hostetter <ho...@fucit.org>.
: Indeed. I wrote the following test:
: 
: Pattern p = Pattern.compile("(.*)");
: Matcher m = p.matcher("xyz");
: Assert.assertEquals("", "Video", m.replaceAll("Video"));
: 
: The test fails. It gives "VideoVideo" as the actual result. I guess there is
: something about Matcher.replaceAll that I don't know. Off to read the
: javadocs then.

".*" matches the empty string (for that matter any regex clause with the 
"*" modifier applied matches the empty string), and iterating over pattern 
matches (ie: what happens if you call Matcher.find() or 
Matcher.replaceAll()) always advances to "first character not matched by 
[the previous] match." (ie: let prev = m.end(); if (m.find) then prev <= 
m.start()).

So ".*" always matches twice on any given String x ... once when it 
matches from 0 to x.length()-1, and one when it matches the empty string 
starting and ending at x.length()-1.

That's why using "^.*" doesn't have this problem ... "*" is greedy so it 
only matches once at the start of the string and then there can't be any 
more matches.  Conversly: ".*$" and ".*\z" will still have this problem, 
because any number of matches can have the same ending offset.


-Hoss


Re: DIH; Hardcode field value/replacement based on source column

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Apr 1, 2009 at 12:13 AM, Fergus McMenemie <fe...@twig.me.uk> wrote:

> Hmmm, I am sure I have seen this as well!
>
>  <field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}"
> sourceColName="fileAbsolutePath"/>
>
> I get the #${x.imgvurl} added twice.
>

Indeed. I wrote the following test:

Pattern p = Pattern.compile("(.*)");
Matcher m = p.matcher("xyz");
Assert.assertEquals("", "Video", m.replaceAll("Video"));

The test fails. It gives "VideoVideo" as the actual result. I guess there is
something about Matcher.replaceAll that I don't know. Off to read the
javadocs then.

-- 
Regards,
Shalin Shekhar Mangar.

Re: DIH; Hardcode field value/replacement based on source column

Posted by Fergus McMenemie <fe...@twig.me.uk>.
Hmmm, I am sure I have seen this as well!

 <field column="id" regex="(.*)" replaceWith="$1#${x.imgvurl}" sourceColName="fileAbsolutePath"/>

I get the #${x.imgvurl} added twice.

Fergus.

>On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:
>
>> I am trying to find a clean way to *hardcode* a field/column to a specific
>> value during the DIH process.  It does seems to be possible but I am getting
>> an slightly invalid constant value in my index.
>> 
>> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
>> replaceWith="Video" />
>> 
>> However, the value in the index was set to "VideoVideo" for all documents.
>> 
>> Any idea why this DIH instruction would see constant value appear twice??
>> 

-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Re: DIH; Hardcode field value/replacement based on source column

Posted by Vernon Chapman <ch...@gmail.com>.
Wesley,

I'm not sure but if what you want is to have Video in the field I would do
something in the sql query.

I use postgres and I needed all my calendar event items to have "event" in
the field named content. So I used the following as my sql query:

select *,'event' as content from events

That way each row has a "content column"  with the value "event" for all
records and the rest is taken care of by the mapping i.e.

<field column="content"    name="content"/>

Hope that helps


Vernon





On 3/31/09 11:50 AM, "Wesley Small" <We...@mtvstaff.com> wrote:

> I am trying to find a clean way to *hardcode* a field/column to a specific
> value during the DIH process.  It does seems to be possible but I am getting
> an slightly invalid constant value in my index.
> 
> <field column="content_type_s" sourceColName="title_t" regex="(.*)"
> replaceWith="Video" />
> 
> However, the value in the index was set to "VideoVideo" for all documents.
> 
> Any idea why this DIH instruction would see constant value appear twice??
> 
> Thanks,
> Wesley.
> 
>