You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Alex Ott <al...@gmail.com> on 2010/05/31 11:13:07 UTC

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

Re

Martijn v Groningen  at "Mon, 31 May 2010 11:10:28 +0200" wrote:
 MvG> No I don't support that now. Supporting it is relative easy. The
 MvG> inputstream needs to be wrapped in a GZipInputStream if the file ends
 MvG> on .gz extension. What version of Pages are you using? When I save a
 MvG> iwork file (pages, numbers or keynote) the file is always without the
 MvG> .gz extension. Or is it a special option when you save your document
 MvG> in Pages?

I'm personally don't use Pages - I just got several documents from my
friends to check my own code.  I can add them to you

-- 
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/           http://alexott.net
http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

Posted by Alex Ott <al...@gmail.com>.
Re

Martijn v Groningen  at "Mon, 31 May 2010 23:10:56 +0200" wrote:
 MvG> Could you share a links to iWorks formats?:
 MvG> Keynote and Pages (for '09):
 MvG> http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html
 MvG> I have not found one for Numbers.

Thank you

 MvG> P.S. Does Tika project has collection of links (or files) to description of
 MvG> different formats, supported by it?
 MvG> You mean this?: http://tika.apache.org/0.7/formats.html

Yes, something like.  But this is links to libraries, that implement
support for concrete formats, not format themselves.

I mean something like -
http://msdn.microsoft.com/en-us/library/cc313118%28office.12%29.aspx -- for
MS Office file formats, pointer to ODF spec, etc.

-- 
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/           http://alexott.net
http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

Posted by Martijn v Groningen <ma...@gmail.com>.
Could you share a links to iWorks formats?:
Keynote and Pages (for '09):
http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html
I have not found one for Numbers.

P.S. Does Tika project has collection of links (or files) to description of
different formats, supported by it?
You mean this?: http://tika.apache.org/0.7/formats.html

On 31 May 2010 21:17, Alex Ott <al...@gmail.com> wrote:
> Re Martijn
>
> Martijn v Groningen  at "Mon, 31 May 2010 20:44:37 +0200" wrote:
>  MvG> I've checked the documents you have sent me. I see that the documents
>  MvG> are from an older version of Pages. The code currently does not
>  MvG> support this version. Also I can't find documentation about this
>  MvG> format, so the format has to be reversed engineered I guess (Just like
>  MvG> the current Numbers format). Apple does have some documentation of the
>  MvG> current Pages format.
>
> Could you share a links to iWorks formats?
>
> P.S. Does Tika project has collection of links (or files) to description of
> different formats, supported by it?
>
> --
> With best wishes, Alex Ott, MBA
> http://alexott.blogspot.com/        http://alexott.net/
> http://alexott-ru.blogspot.com/
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

Posted by Alex Ott <al...@gmail.com>.
Re Martijn

Martijn v Groningen  at "Mon, 31 May 2010 20:44:37 +0200" wrote:
 MvG> I've checked the documents you have sent me. I see that the documents
 MvG> are from an older version of Pages. The code currently does not
 MvG> support this version. Also I can't find documentation about this
 MvG> format, so the format has to be reversed engineered I guess (Just like
 MvG> the current Numbers format). Apple does have some documentation of the
 MvG> current Pages format.

Could you share a links to iWorks formats?

P.S. Does Tika project has collection of links (or files) to description of
different formats, supported by it?

-- 
With best wishes, Alex Ott, MBA
http://alexott.blogspot.com/        http://alexott.net/
http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

Posted by Martijn v Groningen <ma...@gmail.com>.
I've checked the documents you have sent me. I see that the documents
are from an older version of Pages. The code currently does not
support this version. Also I can't find documentation about this
format, so the format has to be reversed engineered I guess (Just like
the current Numbers format). Apple does have some documentation of the
current Pages format.

On 31 May 2010 11:13, Alex Ott <al...@gmail.com> wrote:
> Re
>
> Martijn v Groningen  at "Mon, 31 May 2010 11:10:28 +0200" wrote:
>  MvG> No I don't support that now. Supporting it is relative easy. The
>  MvG> inputstream needs to be wrapped in a GZipInputStream if the file ends
>  MvG> on .gz extension. What version of Pages are you using? When I save a
>  MvG> iwork file (pages, numbers or keynote) the file is always without the
>  MvG> .gz extension. Or is it a special option when you save your document
>  MvG> in Pages?
>
> I'm personally don't use Pages - I just got several documents from my
> friends to check my own code.  I can add them to you
>
> --
> With best wishes, Alex Ott, MBA
> http://alexott.blogspot.com/           http://alexott.net
> http://alexott-ru.blogspot.com/
>



-- 
Met vriendelijke groet,

Martijn van Groningen