You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Alex Ott <al...@gmail.com> on 2010/05/31 11:13:07 UTC
Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents
Re
Martijn v Groningen at "Mon, 31 May 2010 11:10:28 +0200" wrote:
MvG> No I don't support that now. Supporting it is relative easy. The
MvG> inputstream needs to be wrapped in a GZipInputStream if the file ends
MvG> on .gz extension. What version of Pages are you using? When I save a
MvG> iwork file (pages, numbers or keynote) the file is always without the
MvG> .gz extension. Or is it a special option when you save your document
MvG> in Pages?
I'm personally don't use Pages - I just got several documents from my
friends to check my own code. I can add them to you
--
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/ http://alexott.net
http://alexott-ru.blogspot.com/
Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents
Posted by Alex Ott <al...@gmail.com>.
Re
Martijn v Groningen at "Mon, 31 May 2010 23:10:56 +0200" wrote:
MvG> Could you share a links to iWorks formats?:
MvG> Keynote and Pages (for '09):
MvG> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html
MvG> I have not found one for Numbers.
Thank you
MvG> P.S. Does Tika project has collection of links (or files) to description of
MvG> different formats, supported by it?
MvG> You mean this?: http://tika.apache.org/0.7/formats.html
Yes, something like. But this is links to libraries, that implement
support for concrete formats, not format themselves.
I mean something like -
http://msdn.microsoft.com/en-us/library/cc313118%28office.12%29.aspx -- for
MS Office file formats, pointer to ODF spec, etc.
--
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/ http://alexott.net
http://alexott-ru.blogspot.com/
Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages
documents
Posted by Martijn v Groningen <ma...@gmail.com>.
Could you share a links to iWorks formats?:
Keynote and Pages (for '09):
http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html
I have not found one for Numbers.
P.S. Does Tika project has collection of links (or files) to description of
different formats, supported by it?
You mean this?: http://tika.apache.org/0.7/formats.html
On 31 May 2010 21:17, Alex Ott <al...@gmail.com> wrote:
> Re Martijn
>
> Martijn v Groningen at "Mon, 31 May 2010 20:44:37 +0200" wrote:
> MvG> I've checked the documents you have sent me. I see that the documents
> MvG> are from an older version of Pages. The code currently does not
> MvG> support this version. Also I can't find documentation about this
> MvG> format, so the format has to be reversed engineered I guess (Just like
> MvG> the current Numbers format). Apple does have some documentation of the
> MvG> current Pages format.
>
> Could you share a links to iWorks formats?
>
> P.S. Does Tika project has collection of links (or files) to description of
> different formats, supported by it?
>
> --
> With best wishes, Alex Ott, MBA
> http://alexott.blogspot.com/ http://alexott.net/
> http://alexott-ru.blogspot.com/
>
--
Met vriendelijke groet,
Martijn van Groningen
Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents
Posted by Alex Ott <al...@gmail.com>.
Re Martijn
Martijn v Groningen at "Mon, 31 May 2010 20:44:37 +0200" wrote:
MvG> I've checked the documents you have sent me. I see that the documents
MvG> are from an older version of Pages. The code currently does not
MvG> support this version. Also I can't find documentation about this
MvG> format, so the format has to be reversed engineered I guess (Just like
MvG> the current Numbers format). Apple does have some documentation of the
MvG> current Pages format.
Could you share a links to iWorks formats?
P.S. Does Tika project has collection of links (or files) to description of
different formats, supported by it?
--
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/ http://alexott.net/
http://alexott-ru.blogspot.com/
Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages
documents
Posted by Martijn v Groningen <ma...@gmail.com>.
I've checked the documents you have sent me. I see that the documents
are from an older version of Pages. The code currently does not
support this version. Also I can't find documentation about this
format, so the format has to be reversed engineered I guess (Just like
the current Numbers format). Apple does have some documentation of the
current Pages format.
On 31 May 2010 11:13, Alex Ott <al...@gmail.com> wrote:
> Re
>
> Martijn v Groningen at "Mon, 31 May 2010 11:10:28 +0200" wrote:
> MvG> No I don't support that now. Supporting it is relative easy. The
> MvG> inputstream needs to be wrapped in a GZipInputStream if the file ends
> MvG> on .gz extension. What version of Pages are you using? When I save a
> MvG> iwork file (pages, numbers or keynote) the file is always without the
> MvG> .gz extension. Or is it a special option when you save your document
> MvG> in Pages?
>
> I'm personally don't use Pages - I just got several documents from my
> friends to check my own code. I can add them to you
>
> --
> With best wishes, Alex Ott, MBA
> http://alexott.blogspot.com/ http://alexott.net
> http://alexott-ru.blogspot.com/
>
--
Met vriendelijke groet,
Martijn van Groningen