You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Adam Lally <al...@alum.rpi.edu> on 2006/12/05 23:25:16 UTC

Should the SDK assembly contain all three formats of documentation?

We're currently generating documentation in three formats: PDF,
single-page HTML, and multiple-page HTML.  We can put all three
formats on the website, but should the UIMA SDK distribution contain
all three?

As far as size goes, the 3 books that Marshall has completed take up a
total of about 9MB, zipped, for all three formats.  Most of the space
is in images, but the images are duplicated in the PDF and HTML
essentially doubling the space requirement versus having only one of
those formats.

The multiple-page HTML doesn't seem all that useful for inclusion in
the SDK.  I think the main point of that is for remote viewing over a
slow connection.  Taking it out wouldn't save much space, though,
since it shares images with the single-page HTML.

-Adam

Re: Should the SDK assembly contain all three formats of documentation?

Posted by Adam Lally <al...@alum.rpi.edu>.
> Apache projects usually also do a pure source distribution, which is
> convenient for downstream distributors who do their own build and
> packaging.  (And binary distributions normally include the source code).
>

Yes, good point.

Some size measurements for our SDK zip (these will increase if we add
the source code):
No documentation: 7982 KB
With PDF only: 13544 KB
With PDF and HTML (single-page): 16783 KB

So HTML appears to add only an extra 3MB or so.  This is only for
three out of the four books, but IIRC the last one (reference guides)
doesn't have many images and so wouldn't be very large.

I'm leaning towards including both the PDF and single-page HTML.  Any
objections?

-Adam

Re: Should the SDK assembly contain all three formats of documentation?

Posted by Thilo Goetz <tw...@gmx.de>.
Adam Lally wrote:
> I agree, let's limit how many different kinds of distributions we
> have.  Already we may end up with different distributions for Java and
> C++, and maybe one that combines both.  I wouldn't want to bifurcate
> that again with different documentation sets.

Apache projects usually also do a pure source distribution, which is 
convenient for downstream distributors who do their own build and 
packaging.  (And binary distributions normally include the source code).

--Thilo



Re: Should the SDK assembly contain all three formats of documentation?

Posted by Adam Lally <al...@alum.rpi.edu>.
On 12/6/06, Thilo Goetz <tw...@gmx.de> wrote:
> Marshall Schor wrote:
> > How about multiple distributable package things - one with everything
> > (big),
> > one without the docs, and the docs separately available - each kind?
>
> Or we could determine a minimum set of docs we want to ship (for
> example, the pdf), and everything else is downloadable separately.  I'm
> afraid we'll confuse people if we have that many different versions of
> the distribution.  Opinions?
>

I agree, let's limit how many different kinds of distributions we
have.  Already we may end up with different distributions for Java and
C++, and maybe one that combines both.  I wouldn't want to bifurcate
that again with different documentation sets.

For me the choice is  (PDF + single-page HTML) or PDF only.  The only
reason to do PDF only would be to reduce the download size, but
perhaps we should not be too worried about   an extra 10MB, or less
since Marshall thinks we can reduce the size of the image files
without sacrificing too much resolution.

-Adam

Re: Should the SDK assembly contain all three formats of documentation?

Posted by Thilo Goetz <tw...@gmx.de>.
Marshall Schor wrote:
> How about multiple distributable package things - one with everything 
> (big),
> one without the docs, and the docs separately available - each kind?

Or we could determine a minimum set of docs we want to ship (for 
example, the pdf), and everything else is downloadable separately.  I'm 
afraid we'll confuse people if we have that many different versions of 
the distribution.  Opinions?

--Thilo



Re: Should the SDK assembly contain all three formats of documentation?

Posted by Marshall Schor <ms...@schor.com>.
Adam Lally wrote:
> We're currently generating documentation in three formats: PDF,
> single-page HTML, and multiple-page HTML.  We can put all three
> formats on the website, but should the UIMA SDK distribution contain
> all three?
How about multiple distributable package things - one with everything (big),
one without the docs, and the docs separately available - each kind?
> As far as size goes, the 3 books that Marshall has completed take up a
> total of about 9MB, zipped, for all three formats.  Most of the space
> is in images, but the images are duplicated in the PDF and HTML
> essentially doubling the space requirement versus having only one of
> those formats.
The images can be shrunk; there's a way to add an additional step that
uses ghostview (surprise) which can read a PDF and reduce the image
sizes (at some cost in resolution).

Also- note that PDF process is reproducing images way too big - that's
something yet to be figured-out and fixed.
-Marshall