You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wesley Small <We...@mtvstaff.com> on 2009/04/01 13:45:54 UTC

DIH Date conversion from a source column skews time

I have noticed that setting a dynamic date field from source column changes
the time within the date.  Can anyone confirm this?

For example, the document I import has the following xml field.

<field name="original_air_date_d">2002-12-18T00:00:00Z</field>


In my data-inport-config file I define the following instructions:

<field column="temp_original_air_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />

<field column="original_air_year_s"
sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
/.][0-9][0-9][- /.][0-
9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />

<field column="original_air_date_d" sourceColName="temp_original_air_date_s"
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>


What is set in my index is is the following:

<arr name="temp_original_air_date_s">
<str>2002-12-18T00:00:00Z</str>
</arr>

<arr name="original_air_year_s">
<str>2002</str>
</arr>  

<arr name="original_air_date_d">
<date>2002-12-18T05:00:00Z</date>
</arr>

You'll notice that the hour (HH) in original_air_date_d changes is set to
05.  It should still be 00. I have noticed that it changes to either 04 or
05 in all cases within my index.

In my schema the dynamic field "*_d"
<dynamicField name="*_d" type="date" indexed="true" stored="true"/>

Thanks,
Wesley.


Re: DIH Date conversion from a source column skews time

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I same XPATH does not work  . but what is stopping you from copying
one field to another using a Template
 <field column="first_date_d"
 xpath="/add/doc/field[@name='original_air_date_d']" />
 <field column="second_date_s"
 template="${entityname.first_date_d}" />


On Fri, Apr 3, 2009 at 8:39 PM, Wesley Small <We...@mtvstaff.com> wrote:
> Okay, I will give that a try.
>
> I could resolve this any other day by being able to execute the same XPATH
> retrieval twice.  Why does the following not work:
>
> <field column="first_date_d"
> xpath="/add/doc/field[@name='original_air_date_d']" />
> <field column="second_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> When I do this, only the second_date_s will make it into the index.  I know
> first_date_d instruction is valid but, it just disappears.
>
> Any thoughts?
>
> On 4/1/09 11:59 PM, "Noble Paul നോബിള്‍  नोब्ळ्" <no...@gmail.com>
> wrote:
>
>> I guess dateFormat does the job properly but the returned value is
>> changed according to timezone.
>>
>> can y try this out add an extra field which converts the date to toString()
>>
>> <field column="original_air_date_d_str"
>> template="${<entityname>.original_air_date_d}"/>
>> this would add an extra field as string to the index
>>
>>
>>
>> On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com>
>> wrote:
>>> Was there any follow up to this issue I found?  Is this a legitimate bug
>>> with the time of day changing?
>>>
>>> I could try to solve this by executing same xpath statement twice.
>>>
>>> <field column="original_air_date_d"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>
>>> <field column="temp_original_air_date_s"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>
>>> However, when I do that, the first field original_air_date_d does not make
>>> it into the index. Is seems that you cannot have two identical xpath
>>> statements in the data input config file. Is this by design?
>>>
>>>
>>> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>>>
>>>> I have noticed that setting a dynamic date field from source column changes
>>>> the time within the date.  Can anyone confirm this?
>>>>
>>>> For example, the document I import has the following xml field.
>>>>
>>>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>>>
>>>>
>>>> In my data-inport-config file I define the following instructions:
>>>>
>>>> <field column="temp_original_air_date_s"
>>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>>
>>>> <field column="original_air_year_s"
>>>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>>>> /.][0-9][0-9][- /.][0-
>>>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>>>
>>>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>>>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>>>
>>>>
>>>> What is set in my index is is the following:
>>>>
>>>> <arr name="temp_original_air_date_s">
>>>> <str>2002-12-18T00:00:00Z</str>
>>>> </arr>
>>>>
>>>> <arr name="original_air_year_s">
>>>> <str>2002</str>
>>>> </arr>
>>>>
>>>> <arr name="original_air_date_d">
>>>> <date>2002-12-18T05:00:00Z</date>
>>>> </arr>
>>>>
>>>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>>>> 05.  It should still be 00. I have noticed that it changes to either 04 or
>>>> 05 in all cases within my index.
>>>>
>>>> In my schema the dynamic field "*_d"
>>>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>>>
>>>> Thanks,
>>>> Wesley.
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>



-- 
--Noble Paul

Re: DIH Date conversion from a source column skews time

Posted by Wesley Small <We...@mtvstaff.com>.
Okay, I will give that a try.

I could resolve this any other day by being able to execute the same XPATH
retrieval twice.  Why does the following not work:

<field column="first_date_d"
xpath="/add/doc/field[@name='original_air_date_d']" />
<field column="second_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />

When I do this, only the second_date_s will make it into the index.  I know
first_date_d instruction is valid but, it just disappears.

Any thoughts?

On 4/1/09 11:59 PM, "Noble Paul നോബിള്‍  नोब्ळ्" <no...@gmail.com>
wrote:

> I guess dateFormat does the job properly but the returned value is
> changed according to timezone.
> 
> can y try this out add an extra field which converts the date to toString()
> 
> <field column="original_air_date_d_str"
> template="${<entityname>.original_air_date_d}"/>
> this would add an extra field as string to the index
> 
> 
> 
> On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com>
> wrote:
>> Was there any follow up to this issue I found?  Is this a legitimate bug
>> with the time of day changing?
>> 
>> I could try to solve this by executing same xpath statement twice.
>> 
>> <field column="original_air_date_d"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>> 
>> <field column="temp_original_air_date_s"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>> 
>> However, when I do that, the first field original_air_date_d does not make
>> it into the index. Is seems that you cannot have two identical xpath
>> statements in the data input config file. Is this by design?
>> 
>> 
>> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>> 
>>> I have noticed that setting a dynamic date field from source column changes
>>> the time within the date.  Can anyone confirm this?
>>> 
>>> For example, the document I import has the following xml field.
>>> 
>>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>> 
>>> 
>>> In my data-inport-config file I define the following instructions:
>>> 
>>> <field column="temp_original_air_date_s"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>> 
>>> <field column="original_air_year_s"
>>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>>> /.][0-9][0-9][- /.][0-
>>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>> 
>>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>> 
>>> 
>>> What is set in my index is is the following:
>>> 
>>> <arr name="temp_original_air_date_s">
>>> <str>2002-12-18T00:00:00Z</str>
>>> </arr>
>>> 
>>> <arr name="original_air_year_s">
>>> <str>2002</str>
>>> </arr>
>>> 
>>> <arr name="original_air_date_d">
>>> <date>2002-12-18T05:00:00Z</date>
>>> </arr>
>>> 
>>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>>> 05.  It should still be 00. I have noticed that it changes to either 04 or
>>> 05 in all cases within my index.
>>> 
>>> In my schema the dynamic field "*_d"
>>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>> 
>>> Thanks,
>>> Wesley.
>>> 
>>> 
>> 
>> 
> 
> 
> 
> --
> --Noble Paul
> 


Re: DIH Date conversion from a source column skews time

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I guess dateFormat does the job properly but the returned value is
changed according to timezone.

can y try this out add an extra field which converts the date to toString()

<field column="original_air_date_d_str"
template="${<entityname>.original_air_date_d}"/>
this would add an extra field as string to the index



On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com> wrote:
> Was there any follow up to this issue I found?  Is this a legitimate bug
> with the time of day changing?
>
> I could try to solve this by executing same xpath statement twice.
>
> <field column="original_air_date_d"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> <field column="temp_original_air_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> However, when I do that, the first field original_air_date_d does not make
> it into the index. Is seems that you cannot have two identical xpath
> statements in the data input config file. Is this by design?
>
>
> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>
>> I have noticed that setting a dynamic date field from source column changes
>> the time within the date.  Can anyone confirm this?
>>
>> For example, the document I import has the following xml field.
>>
>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>
>>
>> In my data-inport-config file I define the following instructions:
>>
>> <field column="temp_original_air_date_s"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>
>> <field column="original_air_year_s"
>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>> /.][0-9][0-9][- /.][0-
>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>
>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>
>>
>> What is set in my index is is the following:
>>
>> <arr name="temp_original_air_date_s">
>> <str>2002-12-18T00:00:00Z</str>
>> </arr>
>>
>> <arr name="original_air_year_s">
>> <str>2002</str>
>> </arr>
>>
>> <arr name="original_air_date_d">
>> <date>2002-12-18T05:00:00Z</date>
>> </arr>
>>
>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>> 05.  It should still be 00. I have noticed that it changes to either 04 or
>> 05 in all cases within my index.
>>
>> In my schema the dynamic field "*_d"
>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>
>> Thanks,
>> Wesley.
>>
>>
>
>



-- 
--Noble Paul

Re: DIH Date conversion from a source column skews time

Posted by Wesley Small <We...@mtvstaff.com>.
Was there any follow up to this issue I found?  Is this a legitimate bug
with the time of day changing?

I could try to solve this by executing same xpath statement twice.

<field column="original_air_date_d"
xpath="/add/doc/field[@name='original_air_date_d']" />

<field column="temp_original_air_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />

However, when I do that, the first field original_air_date_d does not make
it into the index. Is seems that you cannot have two identical xpath
statements in the data input config file. Is this by design?


On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:

> I have noticed that setting a dynamic date field from source column changes
> the time within the date.  Can anyone confirm this?
> 
> For example, the document I import has the following xml field.
> 
> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
> 
> 
> In my data-inport-config file I define the following instructions:
> 
> <field column="temp_original_air_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
> 
> <field column="original_air_year_s"
> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
> /.][0-9][0-9][- /.][0-
> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
> 
> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
> 
> 
> What is set in my index is is the following:
> 
> <arr name="temp_original_air_date_s">
> <str>2002-12-18T00:00:00Z</str>
> </arr>
> 
> <arr name="original_air_year_s">
> <str>2002</str>
> </arr> 
> 
> <arr name="original_air_date_d">
> <date>2002-12-18T05:00:00Z</date>
> </arr>
> 
> You'll notice that the hour (HH) in original_air_date_d changes is set to
> 05.  It should still be 00. I have noticed that it changes to either 04 or
> 05 in all cases within my index.
> 
> In my schema the dynamic field "*_d"
> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
> 
> Thanks,
> Wesley.
> 
>