You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michel Bottan <fr...@gmail.com> on 2009/10/22 21:52:07 UTC

Environment Timezone being considered when using SolrJ

Hi,

When using SolrJ I've realized document dates are being modified according
to the environment UTC timezone. The timezone is being set in the inner
class ISO8601CanonicalDateFormat of DateField class.

I've read some posts where people say Solr should be most locale and culture
agnostic. So, what's the purpose for that timezone processing before
delivering the content to the client? Any other thoughts related to this
issue?

Code to simulate issue:

import java.io.IOException;

import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.solr.schema.DateField;

public class Teste {
    public static void main(String[] args) throws IOException {
        Field field = new
Field("update","2009-10-02T13:17:00Z",Store.NO,Index.ANALYZED);

        DateField dateField = new DateField();
        System.out.println(dateField.toObject(field));
    }
}

Thanks in advance!
Michel Bottan

Re: Environment Timezone being considered when using SolrJ

Posted by Michel Bottan <fr...@gmail.com>.
Hi Hoss,

Thanks for the clarification again.

Now I can see where the problem resides. My client application was
formatting date fields using SimpleDateFormat and as you said, it assumes
host timezone configuration.

: your dateFormat object doesn't know that the 'Z' at the end of the string
you are asking it to parse means it's in UTC

That is indeed truth, I did not realize it. Now I am setting up TimeZone to
UTC and the dates are being shown accordingly.

Michel


On Wed, Oct 28, 2009 at 2:38 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : I've a wrote a Unit Test in order to simulate the date processing. A high
>
> I think you are missunderstanding what your test is doing, but i'll get to
> that in a second...
>
> : level detail of this problem is that it occurs only when used the JavaBin
> : custom format (&wt=javabin), in this case the dates get back set with
> : environment UTC offset coordinates.
>
> ...if the only time you see a problem is when you use javabin, then a
> testcase demonstrating that (even if it depends on an external Solr port
> being up, ie: the example running on 8983) would be helpful.
>
>        ...but...
>
> i suspect there's a typo in that sentence above, and what you ment to
> write was "dates get SENT BACK with environment UTC offset coordinates"
> ... which strictly speaking isn't possible: An "Date" instance (both
> philosophically, and in the java object sense) has no notion of timezone.
> it is a specific point in the one dimensional space of time, and it's
> only when expressing Dates that they wind up being relative to a refrence
> point (ie: a coordinate system in that 1D space) and the concept of
> timezones gets introduced when using a string based representation.
>
> I believe what you are observing is that the *string* representation of
> the data is formatted in UTC -- which is expected, that is a string
> representation of a Date agnostic of any specific timezone.
>
> As for your test...
>
> The reason the assert fails is because you are building
> your Date object using a SimpleDateFormat object, which by default assumes
> any string it parses is in the TimeZone returned by TimeZene.getDefault()
> (which is host specific)  You can configure it to assume a different
> TimeZone using the setTimeZone method, or by using a pattern that includes
> a TimeZone pattern letter.
>
> In a nutshell: your dateFormat object doesn't know that the 'Z' at the end
> of the string you are asking it to parse means it's in UTC, so it assumes
> you mean 10:10:10 in *your* timzone.  (If you add a
> "System.out.println(originalDateObject)" this should be clear)
> Meanwhile: Solr does recognize that the 'Z' inticates UTC, hence the
> mismatch in Date objects produced.
>
> :         //Given
> :         String originalDateString = "2010-10-10T10:10:10Z";
> :
> :         //When
> :         Field field = new
> : Field("field",originalDateString,Store.NO,Index.ANALYZED);
> :         DateField dateField = new DateField();
> :
> :         SimpleDateFormat dateFormat = new
> : SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
> :         Date originalDateObject = dateFormat.parse(originalDateString);
> :         Date parsedDate = dateField.toObject(field);
> :
> :         //Then
> :         assertEquals(originalDateObject, parsedDate);
>
>
>
> -Hoss
>
>

Re: Environment Timezone being considered when using SolrJ

Posted by Chris Hostetter <ho...@fucit.org>.
: I've a wrote a Unit Test in order to simulate the date processing. A high

I think you are missunderstanding what your test is doing, but i'll get to 
that in a second...

: level detail of this problem is that it occurs only when used the JavaBin
: custom format (&wt=javabin), in this case the dates get back set with
: environment UTC offset coordinates.

...if the only time you see a problem is when you use javabin, then a 
testcase demonstrating that (even if it depends on an external Solr port 
being up, ie: the example running on 8983) would be helpful.

	...but...

i suspect there's a typo in that sentence above, and what you ment to 
write was "dates get SENT BACK with environment UTC offset coordinates" 
... which strictly speaking isn't possible: An "Date" instance (both 
philosophically, and in the java object sense) has no notion of timezone.  
it is a specific point in the one dimensional space of time, and it's 
only when expressing Dates that they wind up being relative to a refrence 
point (ie: a coordinate system in that 1D space) and the concept of 
timezones gets introduced when using a string based representation.

I believe what you are observing is that the *string* representation of 
the data is formatted in UTC -- which is expected, that is a string 
representation of a Date agnostic of any specific timezone.

As for your test...

The reason the assert fails is because you are building 
your Date object using a SimpleDateFormat object, which by default assumes 
any string it parses is in the TimeZone returned by TimeZene.getDefault()
(which is host specific)  You can configure it to assume a different 
TimeZone using the setTimeZone method, or by using a pattern that includes 
a TimeZone pattern letter.

In a nutshell: your dateFormat object doesn't know that the 'Z' at the end 
of the string you are asking it to parse means it's in UTC, so it assumes 
you mean 10:10:10 in *your* timzone.  (If you add a 
"System.out.println(originalDateObject)" this should be clear)  
Meanwhile: Solr does recognize that the 'Z' inticates UTC, hence the 
mismatch in Date objects produced.

:         //Given
:         String originalDateString = "2010-10-10T10:10:10Z";
: 
:         //When
:         Field field = new
: Field("field",originalDateString,Store.NO,Index.ANALYZED);
:         DateField dateField = new DateField();
: 
:         SimpleDateFormat dateFormat = new
: SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
:         Date originalDateObject = dateFormat.parse(originalDateString);
:         Date parsedDate = dateField.toObject(field);
: 
:         //Then
:         assertEquals(originalDateObject, parsedDate);



-Hoss


Re: Environment Timezone being considered when using SolrJ

Posted by Michel Bottan <fr...@gmail.com>.
Hi Hoss,

Thanks for the clarification.

I've a wrote a Unit Test in order to simulate the date processing. A high
level detail of this problem is that it occurs only when used the JavaBin
custom format (&wt=javabin), in this case the dates get back set with
environment UTC offset coordinates.


On Thu, Oct 22, 2009 at 11:41 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : When using SolrJ I've realized document dates are being modified
> according
> : to the environment UTC timezone. The timezone is being set in the inner
> : class ISO8601CanonicalDateFormat of DateField class.
>
> The dates aren't "modified" based on UTC, they are formated in UTC before
> being written to the Lucene index so that no matter what the current
> locale is the index format is consistent.
>

yes, dates are consistent at index.


>
> : I've read some posts where people say Solr should be most locale and
> culture
> : agnostic. So, what's the purpose for that timezone processing before
>
> The use of UTC is specificly to be agnostic of where the server is
> running.  Any client, any where in the world, using any TimeZone can query
> any solr server, running in any JVM, and know that the dates it gets back
> are formated in UTC.
>
> : Code to simulate issue:
>
> I don't actaully see any "issue" being simulated in this code, can you
> elaborate on how exactly it's behaving in a way that's inconsistent with
> your expectaitons?  (making it a JUNit TestCase that uses assserts to fail
> where you are getting data you don't expect is pretty must the universal
> way to describe a bug)
>


import static org.junit.Assert.assertEquals;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.solr.schema.DateField;
import org.junit.Test;

public class DateFieldTest {

    @Test
    public void shouldReturnSameDateValueWhenDateFieldIsUsedToParseDates()
throws ParseException {

        //Given
        String originalDateString = "2010-10-10T10:10:10Z";

        //When
        Field field = new
Field("field",originalDateString,Store.NO,Index.ANALYZED);
        DateField dateField = new DateField();

        SimpleDateFormat dateFormat = new
SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
        Date originalDateObject = dateFormat.parse(originalDateString);
        Date parsedDate = dateField.toObject(field);

        //Then
        assertEquals(originalDateObject, parsedDate);

        /* TO MAKE TEST PASS
         * Solr 1.3
         *
         * Comment line 271 at org.apache.solr.schema.DateField
         *  this.setTimeZone(CANONICAL_TZ);
         */
    }
}


>
> My guess would be that you are getting confused by the fact that
> Date.toString() uses the default locale of your JVM to generate a string,
> which is why the data getting printed out doesn't match the hardcoded
> value in your code...
>
> :         System.out.println(dateField.toObject(field));
>
> but if you take any Date object you want, print it's toString(), index it,
> and then take that indexed string representation convert it back into a
> Date (using dateField.toOBject()) you should originalDate.equals(newDate).
>

I was expecting this behaviour and I get it when performnig an HTTP query
and the XMLResponseWriter is used. But the same does not occur when used the
BinaryResponseWriter.



>
>
>
> -Hoss
>
>

Thanks!
Michel

Re: Environment Timezone being considered when using SolrJ

Posted by Chris Hostetter <ho...@fucit.org>.
: When using SolrJ I've realized document dates are being modified according
: to the environment UTC timezone. The timezone is being set in the inner
: class ISO8601CanonicalDateFormat of DateField class.

The dates aren't "modified" based on UTC, they are formated in UTC before 
being written to the Lucene index so that no matter what the current 
locale is the index format is consistent.

: I've read some posts where people say Solr should be most locale and culture
: agnostic. So, what's the purpose for that timezone processing before

The use of UTC is specificly to be agnostic of where the server is 
running.  Any client, any where in the world, using any TimeZone can query 
any solr server, running in any JVM, and know that the dates it gets back 
are formated in UTC.

: Code to simulate issue:

I don't actaully see any "issue" being simulated in this code, can you 
elaborate on how exactly it's behaving in a way that's inconsistent with 
your expectaitons?  (making it a JUNit TestCase that uses assserts to fail 
where you are getting data you don't expect is pretty must the universal 
way to describe a bug)

My guess would be that you are getting confused by the fact that 
Date.toString() uses the default locale of your JVM to generate a string, 
which is why the data getting printed out doesn't match the hardcoded 
value in your code...

:         System.out.println(dateField.toObject(field));

but if you take any Date object you want, print it's toString(), index it, 
and then take that indexed string representation convert it back into a 
Date (using dateField.toOBject()) you should originalDate.equals(newDate).



-Hoss