You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Alessandro Benedetti <ab...@apache.org> on 2015/05/18 18:29:44 UTC

[Date Format] Render dates in single format

Hi guys,
I am interested in understanding if there is any config param in Tika to
force the rendering of all dates in a specific format.
Independently of the parser.
I would like all my dates to be returned in UTC/ Zulu.

I want this because later I want to index the dates in Solr ( I am using
the Apache Tika Transformation connector inside Apache ManifoldCF) .

Any suggestion ?

-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Date Format] Render dates in single format

Posted by Nick Burch <ap...@gagravarr.org>.
On Tue, 19 May 2015, Alessandro Benedetti wrote:
> Is there any way to know which metadata are Dates or not ?

You can find out if a given property is a date or not. Most of the entries 
on the Metadata object will these days be properties, as we've been trying 
to convert Parsers to use typed Properties wherever possible

Most of the key dates are setup as properties, eg dc:created:
https://tika.apache.org/1.7/api/org/apache/tika/metadata/DublinCore.html#CREATED

> It's very unlucky we can not format all the dates in the same way, in the
> end, when the parser parse the metadata, it knows if it encounter a Date.

All date times are formatted the same way though - ISO-8601. They are 
stored with their native timezone where known, so that users who are 
interested can tell the difference between a file having a time of 12pm 
London time and one with 1pm Paris time.

Tika provides a way to get those as a simple Date if that's all you want, 
and Java provides ways to print those dates out in specific timezones if 
that's what you want

Nick

Re: [Date Format] Render dates in single format

Posted by Alessandro Benedetti <ab...@apache.org>.
Thanks Nick !
Is there any way to know which metadata are Dates or not ?
Because in the method you linked, you need to know beforehand which field
is a date, to get it as a date.
It's very unlucky we can not format all the dates in the same way, in the
end, when the parser parse the metadata, it knows if it encounter a Date.
But I fear that we are not taking track of that, am I right ?

Cheers

2015-05-19 10:14 GMT+01:00 Nick Burch <ap...@gagravarr.org>:

> On Mon, 18 May 2015, Alessandro Benedetti wrote:
>
>> I am interested in understanding if there is any config param in Tika to
>> force the rendering of all dates in a specific format. Independently of the
>> parser.
>>
>
> Nope, you'll need to do it on the output side. The parsers will store the
> dates / date times into the metadata object in the form that they come in
> with, including the timezone where known.
>
> When fetching the metadata values, you can optionally get the date ones as
> a Java date rather than as a string:
>
> https://tika.apache.org/1.7/api/org/apache/tika/metadata/Metadata.html#getDate%28org.apache.tika.metadata.Property%29
>
> If you need things in a specific timezone, format that Date object into one
>
> Nick
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Date Format] Render dates in single format

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 18 May 2015, Alessandro Benedetti wrote:
> I am interested in understanding if there is any config param in Tika to 
> force the rendering of all dates in a specific format. Independently of 
> the parser.

Nope, you'll need to do it on the output side. The parsers will store the 
dates / date times into the metadata object in the form that they come in 
with, including the timezone where known.

When fetching the metadata values, you can optionally get the date ones as 
a Java date rather than as a string:
https://tika.apache.org/1.7/api/org/apache/tika/metadata/Metadata.html#getDate%28org.apache.tika.metadata.Property%29

If you need things in a specific timezone, format that Date object into 
one

Nick