You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Alessandro Benedetti <ab...@apache.org> on 2015/05/18 18:29:44 UTC
[Date Format] Render dates in single format
Hi guys,
I am interested in understanding if there is any config param in Tika to
force the rendering of all dates in a specific format.
Independently of the parser.
I would like all my dates to be returned in UTC/ Zulu.
I want this because later I want to index the dates in Solr ( I am using
the Apache Tika Transformation connector inside Apache ManifoldCF) .
Any suggestion ?
--
--------------------------
Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti
"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"
William Blake - Songs of Experience -1794 England
Re: [Date Format] Render dates in single format
Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 19 May 2015, Alessandro Benedetti wrote:
> Is there any way to know which metadata are Dates or not ?
You can find out if a given property is a date or not. Most of the entries
on the Metadata object will these days be properties, as we've been trying
to convert Parsers to use typed Properties wherever possible
Most of the key dates are setup as properties, eg dc:created:
https://tika.apache.org/1.7/api/org/apache/tika/metadata/DublinCore.html#CREATED
> It's very unlucky we can not format all the dates in the same way, in the
> end, when the parser parse the metadata, it knows if it encounter a Date.
All date times are formatted the same way though - ISO-8601. They are
stored with their native timezone where known, so that users who are
interested can tell the difference between a file having a time of 12pm
London time and one with 1pm Paris time.
Tika provides a way to get those as a simple Date if that's all you want,
and Java provides ways to print those dates out in specific timezones if
that's what you want
Nick
Re: [Date Format] Render dates in single format
Posted by Alessandro Benedetti <ab...@apache.org>.
Thanks Nick !
Is there any way to know which metadata are Dates or not ?
Because in the method you linked, you need to know beforehand which field
is a date, to get it as a date.
It's very unlucky we can not format all the dates in the same way, in the
end, when the parser parse the metadata, it knows if it encounter a Date.
But I fear that we are not taking track of that, am I right ?
Cheers
2015-05-19 10:14 GMT+01:00 Nick Burch <ap...@gagravarr.org>:
> On Mon, 18 May 2015, Alessandro Benedetti wrote:
>
>> I am interested in understanding if there is any config param in Tika to
>> force the rendering of all dates in a specific format. Independently of the
>> parser.
>>
>
> Nope, you'll need to do it on the output side. The parsers will store the
> dates / date times into the metadata object in the form that they come in
> with, including the timezone where known.
>
> When fetching the metadata values, you can optionally get the date ones as
> a Java date rather than as a string:
>
> https://tika.apache.org/1.7/api/org/apache/tika/metadata/Metadata.html#getDate%28org.apache.tika.metadata.Property%29
>
> If you need things in a specific timezone, format that Date object into one
>
> Nick
>
--
--------------------------
Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti
"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"
William Blake - Songs of Experience -1794 England
Re: [Date Format] Render dates in single format
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 18 May 2015, Alessandro Benedetti wrote:
> I am interested in understanding if there is any config param in Tika to
> force the rendering of all dates in a specific format. Independently of
> the parser.
Nope, you'll need to do it on the output side. The parsers will store the
dates / date times into the metadata object in the form that they come in
with, including the timezone where known.
When fetching the metadata values, you can optionally get the date ones as
a Java date rather than as a string:
https://tika.apache.org/1.7/api/org/apache/tika/metadata/Metadata.html#getDate%28org.apache.tika.metadata.Property%29
If you need things in a specific timezone, format that Date object into
one
Nick