You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wesley Small <We...@mtvstaff.com> on 2009/04/01 13:45:54 UTC
DIH Date conversion from a source column skews time
I have noticed that setting a dynamic date field from source column changes
the time within the date. Can anyone confirm this?
For example, the document I import has the following xml field.
<field name="original_air_date_d">2002-12-18T00:00:00Z</field>
In my data-inport-config file I define the following instructions:
<field column="temp_original_air_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />
<field column="original_air_year_s"
sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
/.][0-9][0-9][- /.][0-
9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
<field column="original_air_date_d" sourceColName="temp_original_air_date_s"
dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
What is set in my index is is the following:
<arr name="temp_original_air_date_s">
<str>2002-12-18T00:00:00Z</str>
</arr>
<arr name="original_air_year_s">
<str>2002</str>
</arr>
<arr name="original_air_date_d">
<date>2002-12-18T05:00:00Z</date>
</arr>
You'll notice that the hour (HH) in original_air_date_d changes is set to
05. It should still be 00. I have noticed that it changes to either 04 or
05 in all cases within my index.
In my schema the dynamic field "*_d"
<dynamicField name="*_d" type="date" indexed="true" stored="true"/>
Thanks,
Wesley.
Re: DIH Date conversion from a source column skews time
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
I same XPATH does not work . but what is stopping you from copying
one field to another using a Template
<field column="first_date_d"
xpath="/add/doc/field[@name='original_air_date_d']" />
<field column="second_date_s"
template="${entityname.first_date_d}" />
On Fri, Apr 3, 2009 at 8:39 PM, Wesley Small <We...@mtvstaff.com> wrote:
> Okay, I will give that a try.
>
> I could resolve this any other day by being able to execute the same XPATH
> retrieval twice. Why does the following not work:
>
> <field column="first_date_d"
> xpath="/add/doc/field[@name='original_air_date_d']" />
> <field column="second_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> When I do this, only the second_date_s will make it into the index. I know
> first_date_d instruction is valid but, it just disappears.
>
> Any thoughts?
>
> On 4/1/09 11:59 PM, "Noble Paul നോബിള് नोब्ळ्" <no...@gmail.com>
> wrote:
>
>> I guess dateFormat does the job properly but the returned value is
>> changed according to timezone.
>>
>> can y try this out add an extra field which converts the date to toString()
>>
>> <field column="original_air_date_d_str"
>> template="${<entityname>.original_air_date_d}"/>
>> this would add an extra field as string to the index
>>
>>
>>
>> On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com>
>> wrote:
>>> Was there any follow up to this issue I found? Is this a legitimate bug
>>> with the time of day changing?
>>>
>>> I could try to solve this by executing same xpath statement twice.
>>>
>>> <field column="original_air_date_d"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>
>>> <field column="temp_original_air_date_s"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>
>>> However, when I do that, the first field original_air_date_d does not make
>>> it into the index. Is seems that you cannot have two identical xpath
>>> statements in the data input config file. Is this by design?
>>>
>>>
>>> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>>>
>>>> I have noticed that setting a dynamic date field from source column changes
>>>> the time within the date. Can anyone confirm this?
>>>>
>>>> For example, the document I import has the following xml field.
>>>>
>>>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>>>
>>>>
>>>> In my data-inport-config file I define the following instructions:
>>>>
>>>> <field column="temp_original_air_date_s"
>>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>>
>>>> <field column="original_air_year_s"
>>>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>>>> /.][0-9][0-9][- /.][0-
>>>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>>>
>>>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>>>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>>>
>>>>
>>>> What is set in my index is is the following:
>>>>
>>>> <arr name="temp_original_air_date_s">
>>>> <str>2002-12-18T00:00:00Z</str>
>>>> </arr>
>>>>
>>>> <arr name="original_air_year_s">
>>>> <str>2002</str>
>>>> </arr>
>>>>
>>>> <arr name="original_air_date_d">
>>>> <date>2002-12-18T05:00:00Z</date>
>>>> </arr>
>>>>
>>>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>>>> 05. It should still be 00. I have noticed that it changes to either 04 or
>>>> 05 in all cases within my index.
>>>>
>>>> In my schema the dynamic field "*_d"
>>>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>>>
>>>> Thanks,
>>>> Wesley.
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
--
--Noble Paul
Re: DIH Date conversion from a source column skews time
Posted by Wesley Small <We...@mtvstaff.com>.
Okay, I will give that a try.
I could resolve this any other day by being able to execute the same XPATH
retrieval twice. Why does the following not work:
<field column="first_date_d"
xpath="/add/doc/field[@name='original_air_date_d']" />
<field column="second_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />
When I do this, only the second_date_s will make it into the index. I know
first_date_d instruction is valid but, it just disappears.
Any thoughts?
On 4/1/09 11:59 PM, "Noble Paul നോബിള് नोब्ळ्" <no...@gmail.com>
wrote:
> I guess dateFormat does the job properly but the returned value is
> changed according to timezone.
>
> can y try this out add an extra field which converts the date to toString()
>
> <field column="original_air_date_d_str"
> template="${<entityname>.original_air_date_d}"/>
> this would add an extra field as string to the index
>
>
>
> On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com>
> wrote:
>> Was there any follow up to this issue I found? Is this a legitimate bug
>> with the time of day changing?
>>
>> I could try to solve this by executing same xpath statement twice.
>>
>> <field column="original_air_date_d"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>
>> <field column="temp_original_air_date_s"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>
>> However, when I do that, the first field original_air_date_d does not make
>> it into the index. Is seems that you cannot have two identical xpath
>> statements in the data input config file. Is this by design?
>>
>>
>> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>>
>>> I have noticed that setting a dynamic date field from source column changes
>>> the time within the date. Can anyone confirm this?
>>>
>>> For example, the document I import has the following xml field.
>>>
>>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>>
>>>
>>> In my data-inport-config file I define the following instructions:
>>>
>>> <field column="temp_original_air_date_s"
>>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>>
>>> <field column="original_air_year_s"
>>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>>> /.][0-9][0-9][- /.][0-
>>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>>
>>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>>
>>>
>>> What is set in my index is is the following:
>>>
>>> <arr name="temp_original_air_date_s">
>>> <str>2002-12-18T00:00:00Z</str>
>>> </arr>
>>>
>>> <arr name="original_air_year_s">
>>> <str>2002</str>
>>> </arr>
>>>
>>> <arr name="original_air_date_d">
>>> <date>2002-12-18T05:00:00Z</date>
>>> </arr>
>>>
>>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>>> 05. It should still be 00. I have noticed that it changes to either 04 or
>>> 05 in all cases within my index.
>>>
>>> In my schema the dynamic field "*_d"
>>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>>
>>> Thanks,
>>> Wesley.
>>>
>>>
>>
>>
>
>
>
> --
> --Noble Paul
>
Re: DIH Date conversion from a source column skews time
Posted by Noble Paul നോബിള് नोब्ळ् <no...@gmail.com>.
I guess dateFormat does the job properly but the returned value is
changed according to timezone.
can y try this out add an extra field which converts the date to toString()
<field column="original_air_date_d_str"
template="${<entityname>.original_air_date_d}"/>
this would add an extra field as string to the index
On Wed, Apr 1, 2009 at 11:31 PM, Wesley Small <We...@mtvstaff.com> wrote:
> Was there any follow up to this issue I found? Is this a legitimate bug
> with the time of day changing?
>
> I could try to solve this by executing same xpath statement twice.
>
> <field column="original_air_date_d"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> <field column="temp_original_air_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> However, when I do that, the first field original_air_date_d does not make
> it into the index. Is seems that you cannot have two identical xpath
> statements in the data input config file. Is this by design?
>
>
> On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
>
>> I have noticed that setting a dynamic date field from source column changes
>> the time within the date. Can anyone confirm this?
>>
>> For example, the document I import has the following xml field.
>>
>> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>>
>>
>> In my data-inport-config file I define the following instructions:
>>
>> <field column="temp_original_air_date_s"
>> xpath="/add/doc/field[@name='original_air_date_d']" />
>>
>> <field column="original_air_year_s"
>> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
>> /.][0-9][0-9][- /.][0-
>> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>>
>> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
>> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>>
>>
>> What is set in my index is is the following:
>>
>> <arr name="temp_original_air_date_s">
>> <str>2002-12-18T00:00:00Z</str>
>> </arr>
>>
>> <arr name="original_air_year_s">
>> <str>2002</str>
>> </arr>
>>
>> <arr name="original_air_date_d">
>> <date>2002-12-18T05:00:00Z</date>
>> </arr>
>>
>> You'll notice that the hour (HH) in original_air_date_d changes is set to
>> 05. It should still be 00. I have noticed that it changes to either 04 or
>> 05 in all cases within my index.
>>
>> In my schema the dynamic field "*_d"
>> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>>
>> Thanks,
>> Wesley.
>>
>>
>
>
--
--Noble Paul
Re: DIH Date conversion from a source column skews time
Posted by Wesley Small <We...@mtvstaff.com>.
Was there any follow up to this issue I found? Is this a legitimate bug
with the time of day changing?
I could try to solve this by executing same xpath statement twice.
<field column="original_air_date_d"
xpath="/add/doc/field[@name='original_air_date_d']" />
<field column="temp_original_air_date_s"
xpath="/add/doc/field[@name='original_air_date_d']" />
However, when I do that, the first field original_air_date_d does not make
it into the index. Is seems that you cannot have two identical xpath
statements in the data input config file. Is this by design?
On 4/1/09 7:45 AM, "Small, Wesley" <We...@mtvstaff.com> wrote:
> I have noticed that setting a dynamic date field from source column changes
> the time within the date. Can anyone confirm this?
>
> For example, the document I import has the following xml field.
>
> <field name="original_air_date_d">2002-12-18T00:00:00Z</field>
>
>
> In my data-inport-config file I define the following instructions:
>
> <field column="temp_original_air_date_s"
> xpath="/add/doc/field[@name='original_air_date_d']" />
>
> <field column="original_air_year_s"
> sourceColName="temp_original_air_date_s" regex="([0-9][0-9][0-9][0-9])[-
> /.][0-9][0-9][- /.][0-
> 9][0-9][T][0-9][0-9][:][0-9][0-9][:][0-9][0-9][Z]" replaceWith="$1" />
>
> <field column="original_air_date_d" sourceColName="temp_original_air_date_s"
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss'Z'"/>
>
>
> What is set in my index is is the following:
>
> <arr name="temp_original_air_date_s">
> <str>2002-12-18T00:00:00Z</str>
> </arr>
>
> <arr name="original_air_year_s">
> <str>2002</str>
> </arr>
>
> <arr name="original_air_date_d">
> <date>2002-12-18T05:00:00Z</date>
> </arr>
>
> You'll notice that the hour (HH) in original_air_date_d changes is set to
> 05. It should still be 00. I have noticed that it changes to either 04 or
> 05 in all cases within my index.
>
> In my schema the dynamic field "*_d"
> <dynamicField name="*_d" type="date" indexed="true" stored="true"/>
>
> Thanks,
> Wesley.
>
>