You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michel Bottan <fr...@gmail.com> on 2009/10/22 21:52:07 UTC
Environment Timezone being considered when using SolrJ
Hi,
When using SolrJ I've realized document dates are being modified according
to the environment UTC timezone. The timezone is being set in the inner
class ISO8601CanonicalDateFormat of DateField class.
I've read some posts where people say Solr should be most locale and culture
agnostic. So, what's the purpose for that timezone processing before
delivering the content to the client? Any other thoughts related to this
issue?
Code to simulate issue:
import java.io.IOException;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.solr.schema.DateField;
public class Teste {
public static void main(String[] args) throws IOException {
Field field = new
Field("update","2009-10-02T13:17:00Z",Store.NO,Index.ANALYZED);
DateField dateField = new DateField();
System.out.println(dateField.toObject(field));
}
}
Thanks in advance!
Michel Bottan
Re: Environment Timezone being considered when using SolrJ
Posted by Michel Bottan <fr...@gmail.com>.
Hi Hoss,
Thanks for the clarification again.
Now I can see where the problem resides. My client application was
formatting date fields using SimpleDateFormat and as you said, it assumes
host timezone configuration.
: your dateFormat object doesn't know that the 'Z' at the end of the string
you are asking it to parse means it's in UTC
That is indeed truth, I did not realize it. Now I am setting up TimeZone to
UTC and the dates are being shown accordingly.
Michel
On Wed, Oct 28, 2009 at 2:38 AM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : I've a wrote a Unit Test in order to simulate the date processing. A high
>
> I think you are missunderstanding what your test is doing, but i'll get to
> that in a second...
>
> : level detail of this problem is that it occurs only when used the JavaBin
> : custom format (&wt=javabin), in this case the dates get back set with
> : environment UTC offset coordinates.
>
> ...if the only time you see a problem is when you use javabin, then a
> testcase demonstrating that (even if it depends on an external Solr port
> being up, ie: the example running on 8983) would be helpful.
>
> ...but...
>
> i suspect there's a typo in that sentence above, and what you ment to
> write was "dates get SENT BACK with environment UTC offset coordinates"
> ... which strictly speaking isn't possible: An "Date" instance (both
> philosophically, and in the java object sense) has no notion of timezone.
> it is a specific point in the one dimensional space of time, and it's
> only when expressing Dates that they wind up being relative to a refrence
> point (ie: a coordinate system in that 1D space) and the concept of
> timezones gets introduced when using a string based representation.
>
> I believe what you are observing is that the *string* representation of
> the data is formatted in UTC -- which is expected, that is a string
> representation of a Date agnostic of any specific timezone.
>
> As for your test...
>
> The reason the assert fails is because you are building
> your Date object using a SimpleDateFormat object, which by default assumes
> any string it parses is in the TimeZone returned by TimeZene.getDefault()
> (which is host specific) You can configure it to assume a different
> TimeZone using the setTimeZone method, or by using a pattern that includes
> a TimeZone pattern letter.
>
> In a nutshell: your dateFormat object doesn't know that the 'Z' at the end
> of the string you are asking it to parse means it's in UTC, so it assumes
> you mean 10:10:10 in *your* timzone. (If you add a
> "System.out.println(originalDateObject)" this should be clear)
> Meanwhile: Solr does recognize that the 'Z' inticates UTC, hence the
> mismatch in Date objects produced.
>
> : //Given
> : String originalDateString = "2010-10-10T10:10:10Z";
> :
> : //When
> : Field field = new
> : Field("field",originalDateString,Store.NO,Index.ANALYZED);
> : DateField dateField = new DateField();
> :
> : SimpleDateFormat dateFormat = new
> : SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
> : Date originalDateObject = dateFormat.parse(originalDateString);
> : Date parsedDate = dateField.toObject(field);
> :
> : //Then
> : assertEquals(originalDateObject, parsedDate);
>
>
>
> -Hoss
>
>
Re: Environment Timezone being considered when using SolrJ
Posted by Chris Hostetter <ho...@fucit.org>.
: I've a wrote a Unit Test in order to simulate the date processing. A high
I think you are missunderstanding what your test is doing, but i'll get to
that in a second...
: level detail of this problem is that it occurs only when used the JavaBin
: custom format (&wt=javabin), in this case the dates get back set with
: environment UTC offset coordinates.
...if the only time you see a problem is when you use javabin, then a
testcase demonstrating that (even if it depends on an external Solr port
being up, ie: the example running on 8983) would be helpful.
...but...
i suspect there's a typo in that sentence above, and what you ment to
write was "dates get SENT BACK with environment UTC offset coordinates"
... which strictly speaking isn't possible: An "Date" instance (both
philosophically, and in the java object sense) has no notion of timezone.
it is a specific point in the one dimensional space of time, and it's
only when expressing Dates that they wind up being relative to a refrence
point (ie: a coordinate system in that 1D space) and the concept of
timezones gets introduced when using a string based representation.
I believe what you are observing is that the *string* representation of
the data is formatted in UTC -- which is expected, that is a string
representation of a Date agnostic of any specific timezone.
As for your test...
The reason the assert fails is because you are building
your Date object using a SimpleDateFormat object, which by default assumes
any string it parses is in the TimeZone returned by TimeZene.getDefault()
(which is host specific) You can configure it to assume a different
TimeZone using the setTimeZone method, or by using a pattern that includes
a TimeZone pattern letter.
In a nutshell: your dateFormat object doesn't know that the 'Z' at the end
of the string you are asking it to parse means it's in UTC, so it assumes
you mean 10:10:10 in *your* timzone. (If you add a
"System.out.println(originalDateObject)" this should be clear)
Meanwhile: Solr does recognize that the 'Z' inticates UTC, hence the
mismatch in Date objects produced.
: //Given
: String originalDateString = "2010-10-10T10:10:10Z";
:
: //When
: Field field = new
: Field("field",originalDateString,Store.NO,Index.ANALYZED);
: DateField dateField = new DateField();
:
: SimpleDateFormat dateFormat = new
: SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
: Date originalDateObject = dateFormat.parse(originalDateString);
: Date parsedDate = dateField.toObject(field);
:
: //Then
: assertEquals(originalDateObject, parsedDate);
-Hoss
Re: Environment Timezone being considered when using SolrJ
Posted by Michel Bottan <fr...@gmail.com>.
Hi Hoss,
Thanks for the clarification.
I've a wrote a Unit Test in order to simulate the date processing. A high
level detail of this problem is that it occurs only when used the JavaBin
custom format (&wt=javabin), in this case the dates get back set with
environment UTC offset coordinates.
On Thu, Oct 22, 2009 at 11:41 PM, Chris Hostetter
<ho...@fucit.org>wrote:
>
> : When using SolrJ I've realized document dates are being modified
> according
> : to the environment UTC timezone. The timezone is being set in the inner
> : class ISO8601CanonicalDateFormat of DateField class.
>
> The dates aren't "modified" based on UTC, they are formated in UTC before
> being written to the Lucene index so that no matter what the current
> locale is the index format is consistent.
>
yes, dates are consistent at index.
>
> : I've read some posts where people say Solr should be most locale and
> culture
> : agnostic. So, what's the purpose for that timezone processing before
>
> The use of UTC is specificly to be agnostic of where the server is
> running. Any client, any where in the world, using any TimeZone can query
> any solr server, running in any JVM, and know that the dates it gets back
> are formated in UTC.
>
> : Code to simulate issue:
>
> I don't actaully see any "issue" being simulated in this code, can you
> elaborate on how exactly it's behaving in a way that's inconsistent with
> your expectaitons? (making it a JUNit TestCase that uses assserts to fail
> where you are getting data you don't expect is pretty must the universal
> way to describe a bug)
>
import static org.junit.Assert.assertEquals;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.solr.schema.DateField;
import org.junit.Test;
public class DateFieldTest {
@Test
public void shouldReturnSameDateValueWhenDateFieldIsUsedToParseDates()
throws ParseException {
//Given
String originalDateString = "2010-10-10T10:10:10Z";
//When
Field field = new
Field("field",originalDateString,Store.NO,Index.ANALYZED);
DateField dateField = new DateField();
SimpleDateFormat dateFormat = new
SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
Date originalDateObject = dateFormat.parse(originalDateString);
Date parsedDate = dateField.toObject(field);
//Then
assertEquals(originalDateObject, parsedDate);
/* TO MAKE TEST PASS
* Solr 1.3
*
* Comment line 271 at org.apache.solr.schema.DateField
* this.setTimeZone(CANONICAL_TZ);
*/
}
}
>
> My guess would be that you are getting confused by the fact that
> Date.toString() uses the default locale of your JVM to generate a string,
> which is why the data getting printed out doesn't match the hardcoded
> value in your code...
>
> : System.out.println(dateField.toObject(field));
>
> but if you take any Date object you want, print it's toString(), index it,
> and then take that indexed string representation convert it back into a
> Date (using dateField.toOBject()) you should originalDate.equals(newDate).
>
I was expecting this behaviour and I get it when performnig an HTTP query
and the XMLResponseWriter is used. But the same does not occur when used the
BinaryResponseWriter.
>
>
>
> -Hoss
>
>
Thanks!
Michel
Re: Environment Timezone being considered when using SolrJ
Posted by Chris Hostetter <ho...@fucit.org>.
: When using SolrJ I've realized document dates are being modified according
: to the environment UTC timezone. The timezone is being set in the inner
: class ISO8601CanonicalDateFormat of DateField class.
The dates aren't "modified" based on UTC, they are formated in UTC before
being written to the Lucene index so that no matter what the current
locale is the index format is consistent.
: I've read some posts where people say Solr should be most locale and culture
: agnostic. So, what's the purpose for that timezone processing before
The use of UTC is specificly to be agnostic of where the server is
running. Any client, any where in the world, using any TimeZone can query
any solr server, running in any JVM, and know that the dates it gets back
are formated in UTC.
: Code to simulate issue:
I don't actaully see any "issue" being simulated in this code, can you
elaborate on how exactly it's behaving in a way that's inconsistent with
your expectaitons? (making it a JUNit TestCase that uses assserts to fail
where you are getting data you don't expect is pretty must the universal
way to describe a bug)
My guess would be that you are getting confused by the fact that
Date.toString() uses the default locale of your JVM to generate a string,
which is why the data getting printed out doesn't match the hardcoded
value in your code...
: System.out.println(dateField.toObject(field));
but if you take any Date object you want, print it's toString(), index it,
and then take that indexed string representation convert it back into a
Date (using dateField.toOBject()) you should originalDate.equals(newDate).
-Hoss