You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by mehdi houshmand <me...@gmail.com> on 2012/03/01 11:50:34 UTC

Google Summer of Code

Hi,

We're thinking of submitting a proposal or two to the Google Summer of
Code project and wanted to get some input from the community on ideas.
Once we've got a few proposals I'll create a wiki page and put all the
ideas on there, but for now I just wanted to gauge interest.

In terms of mentoring, I'm happy to be a mentor and I've registered as
one and if any other committers fancy the job, do register, the more
the merrier. The deadline is 9th March, so that doesn't give us long
to bounce around ideas, but here are a few I was thinking:

- There have been recent discussions between Jeremias, myself and
others about extracting the Fonts packages into their own library. I
think this would be a great idea for a project because essentially it
only involves a few, well defined specifications (TTF, Type1 etc) and
doesn't expose the person to too much complexity. The way I'd suggest
this to be done, is by re-writing rather than porting, that way it
gives the person much more flexibility and also the current code would
give them good tips and tricks on how to deal with parsing fonts.

- TTF in AFP. I know we still have the TrueTypeInPostScript branch
flying around, and however much I'd like to fob that onto someone
else, I don't think it's fair to do so. I have no idea how long this
project would take, but I think FOP could really benefit from it.
Currently we're forcing users to use AFP fonts for AFP documents, a
lot of which are archaic and use EBCDIC, for those of you who haven't
been exposed to EBCDIC, count yourself lucky.

There may be something to do with PCL?? I'm not at all familiar with
the format, but I do remember discussions about upgrading to a newer
PCL standard? I'd be happy to acquaint myself with the format if
there's interest in the idea.

Hopefully we can get a proposal together in time.

Mehdi

Fwd: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

---------- Forwarded message ----------
From: mehdi houshmand <me...@gmail.com>
Date: 6 March 2012 10:12
Subject: Fwd: Google Summer of Code
To: fop-dev@xmlgraphics.apache.org

I fat-fingered the reply button instead of reply-to-all... *face-palm*

---------- Forwarded message ----------
From: mehdi houshmand <me...@gmail.com>
Date: 6 March 2012 09:33
Subject: Re: Google Summer of Code
To: Craig Ringer <ri...@ringerc.id.au>

On 6 March 2012 00:16, Craig Ringer <ri...@ringerc.id.au> wrote:
> On 03/05/2012 09:35 PM, mehdi houshmand wrote:
>>
>> Because of the overwhelming popularity of this idea, I've created a
>> link on the Wiki
>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>> the GSoC proposals.
>>
> Things that come to mind for me:
>
> - PDFBox backend (probably ideal for GSoC, nice and self contained, great
> for someone who knows PDFBox and wants to learn fop's codebase);
>
> - CID fonts in PostScript (good for someone who knows PS and fonts, not
> necessarily XSL-FO so much);

There is already a big body of work that does this, check the
TrueTypeInPostScript branch as well as the patch
https://issues.apache.org/bugzilla/show_bug.cgi?id=50483. This stuff
needs to be merged into trunk and we do have that on our agenda,
but... I don't make the rules.

>
> - Using automatic +- kerning, +- tracking *and* +- horizontal type scaling
> adjustment to better auto-fit text, involving support for font-stretch
> property. This touches on layout so it may not be practical for a 1st fop
> project, but may not be too bad since fop already adjusts tracking when
> justifying text. The key interest points would be *negative* tracking,
> kerning and (if nothing else works) glyph-scaling for tighter type-fitting
> where it's not desirable to break to a new line due to widow/orphan policy
> or because it'd create large holes. This is particularly important when long
> unbreakable words must fit a fixed width space.

This sounds pretty interesting!! Could you put this and maybe a little
more information in a proposal similar to
https://issues.apache.org/jira/browse/COMDEV-66 or
https://issues.apache.org/jira/browse/COMDEV-67 and I'll create a JIRA
issue.

>
> - PDF/X-1a with CMYK;

I have no idea what is involved here, sounds like a lot of time in the
spec and battling FOP, but as I said, those are baseless assumptions.
Is that an interesting project?

>
> - Anything in the proposed XSL-FO 2.0 feature list (though most of it won't
> be realistic for GSoC projects);
>
> - Merge fop-pdf-image and implement smart merging of font, profile, and
> image resources. I'm working on this one at the moment, but slowly and only
> amid other projects.

I really don't think that's a suitable project, I responded to your
post so maybe we could take this conversation else where, but this
really isn't FOPs responsibilty, or for that matter the
pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
Adobe Acrobat Pro does this kind of thing (badly may I add) as a
post-process action and I think that's the correct way to do it. The
other thing to say is that a new comer may not appreciate the
importance of fidelity when fonts are concerned. Basically it's too
difficult for a student given a few months and no previous experience.

>
> --
> Craig Ringer

Re: Fwd: Google Summer of Code

Posted by Craig Ringer <ri...@ringerc.id.au>.

On 03/06/2012 07:29 PM, Chris Bowditch wrote:
> On 06/03/2012 11:08, mehdi houshmand wrote:
>
> Hi Mehdi,
>
>> Font de-duping is intrinsically a post-process action, you need the
>> full document, with all fonts, before you can do any font de-duping.
>> PostScript does this very thing (to a much lesser extent) with the
>> <optimize-resources>  tag, as a post-process action.
> At least that is transparent to the user, but re-parsing the input is 
> a sub-optimal solution as it incurs a performance penalty so we should 
> investigate if there are alternatives first. I can't recall why the 
> Postscript Paintewr/Renderer was architected in that way but thats a 
> separate topic.

At a guess, because PostScript is much less capable of non-linear 
references and access than PDF is. It's more expensive and slower to 
forward-reference resources because PostScript has to parse and execute 
all the rest of the document to find the resource it wants, while PDF 
just seeks to the object at the byte offset referenced in the xref table 
and reads only the object it requires.

>>
> The requirements are perfectly clear: Given a set of input PDFs, 
> XSL-FO, create a single merged PDF with a consistent and unduplicated 
> set of fonts. Why would there be slight kerning differences if the 
> assumption that the font name is unique holds true.
Assuming the font name is unique is dangerous, since it's provably true 
that in the wild there are numerous subtly (and sometimes grossly) 
different fonts with the same name.

The font dictionary contains glyph metrics information that along with 
the font name, slant, weight etc can be used to match the font rather 
more closely. For extra caution, checksums of subset glyphs can be done 
to make sure they're *identical*, but honestly that's unnecessary if the 
metrics match.
> If that assumption is wrong then I agree with what you say. Ultimately 
> that should be down to the user though, they know their fonts, so they 
> can decide whether to merge them or not via a setting in the 
> fop.xconf. Your argument is not sufficient to say this approach should 
> never be used. It brings a lot of benefit to users who know their font 
> names are unique.
It should be safe to do automatically and transparently by default, 
because only partially overlapping subsets of identical fonts should 
ever be merged. Anything else is a substitution not merging duplicate 
subsets, and has entirely different considerations because of the 
possibility of visible changes caused by non-matching metrics etc.

--
Craig Ringer

Re: Fwd: Google Summer of Code

Posted by Chris Bowditch <bo...@hotmail.com>.

On 06/03/2012 11:08, mehdi houshmand wrote:

Hi Mehdi,

> Font de-duping is intrinsically a post-process action, you need the
> full document, with all fonts, before you can do any font de-duping.
> PostScript does this very thing (to a much lesser extent) with the
> <optimize-resources>  tag, as a post-process action.
At least that is transparent to the user, but re-parsing the input is a 
sub-optimal solution as it incurs a performance penalty so we should 
investigate if there are alternatives first. I can't recall why the 
Postscript Paintewr/Renderer was architected in that way but thats a 
separate topic.
>
> Also, the requirements aren't clear here, what is it we want here? Let
> me validate that, this shouldn't change the (I guess we can call it)
> "canonical" PDF document. By that I mean if you rasterized a PDF
> before and after this change they should be identical,
> pixel-for-pixel. When Acrobat does the font de-duping (I don't
> remember how much control it gives you, but if there are levels of
> de-duping I would have chosen the most aggressive), the documents
> aren't identical. There are aberrations caused by slight kerning
> differences between various verisons of Arial. This may seem trivial
> when compared to bloated PDFs, but it looks tacky and lowers the high
> standard of documents. You could argue this could be configurable...
> But then I'd re-iterate my first argument, this is a post-process
> action, not the concern of FOP or the pdf-image-plugin.
The requirements are perfectly clear: Given a set of input PDFs, XSL-FO, 
create a single merged PDF with a consistent and unduplicated set of 
fonts. Why would there be slight kerning differences if the assumption 
that the font name is unique holds true. If that assumption is wrong 
then I agree with what you say. Ultimately that should be down to the 
user though, they know their fonts, so they can decide whether to merge 
them or not via a setting in the fop.xconf. Your argument is not 
sufficient to say this approach should never be used. It brings a lot of 
benefit to users who know their font names are unique.
>
> The other issue is you have subset fonts created by FOP as well as
> those imported by the pdf-image-plugin. You'd have to create some
> bridge between the image loading framework and the font loading system
> *cough* HACK *cough*. Alternatively, just thinking aloud here, if this
> was done as a post-process *wink* *wink* *wry smile*...
Jeremias and Craig have already sent e-mails on this topic. It is 
perfectly valid for any image loaded via the image loading framework to 
pass around contextual information. If the changes are done properly 
then it is not a hack. Sure there are some easy ways to do it that 
classify a hack, but I prefer to follow the approach outlines by 
Jeremias in one of his off list e-mails about storing contextual 
information for images loaded via the image loading framework.
>
> Apologies if I may seem to be argumentative here, it's not my
> intention, but I feel this is would be serious scope creep. I see the
> pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
> If you want to stitch together PDFs, PDFBox is designed just for that.
It's true that this work touches more than FOP, but I don't see that as 
a good argument against using this as a GSoC project. All the code that 
this touches is open source, with the exception of the image loader 
plug-in and that is something the PMC is discussing with Jeremias.

Thanks,

Chris

>
> Mehdi
>
> On 6 March 2012 10:36, Chris Bowditch<bo...@hotmail.com>  wrote:
>> On 06/03/2012 10:12, mehdi houshmand wrote:
>>> I fat-fingered the reply button instead of reply-to-all... *face-palm*
>>
>> Mehdi, Craig,
>> <snip/>
>>
>>
>>
>>>> - Anything in the proposed XSL-FO 2.0 feature list (though most of it
>>>> won't
>>>> be realistic for GSoC projects);
>>>>
>>>> - Merge fop-pdf-image and implement smart merging of font, profile, and
>>>> image resources. I'm working on this one at the moment, but slowly and
>>>> only
>>>> amid other projects.
>>> I really don't think that's a suitable project, I responded to your
>>> post so maybe we could take this conversation else where, but this
>>> really isn't FOPs responsibilty, or for that matter the
>>> pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
>>> Adobe Acrobat Pro does this kind of thing (badly may I add) as a
>>> post-process action and I think that's the correct way to do it. The
>>> other thing to say is that a new comer may not appreciate the
>>> importance of fidelity when fonts are concerned. Basically it's too
>>> difficult for a student given a few months and no previous experience.
>> Sorry Mehdi I don't agree. I think this would be a great project. Craig
>> already outlined what needs to be done and theres a lot of stuff in XGC and
>> FOP as well as the plug-in. I'm not sure anything is needed in PDF-Box, but
>> even if it then is an Apache project too and the student can submit patches
>> there. Adobe Acrobat may make some assumptions that don't always hold true,
>> but our customers are crying out for FOP to create smaller PDF files when
>> importing multiple PDF images with embedded fonts. This also feels
>> reasonable well defined thanks to Craig's list of TODOs and feels like it
>> can be done in 3 months. It gets a +1 from me.
>>
>> Thanks,
>>
>> Chris
>>
>>>> --
>>>> Craig Ringer
>>>
>

Re: Fwd: Google Summer of Code

Posted by Craig Ringer <ri...@ringerc.id.au>.

My reply is interleaved below, but there's something important to cover 
before reading on.

There's clearly a difference in what I mean by de-duplication vs what 
you're thinking I mean by de-duplication. As far as I can tell you're 
looking at font substitution and un/re-embedding, where (eg) Helvetica 
LT Std is replaced with Helvetica Neue Sans, a different version of 
Helvetica LT Std, the built-in Helvetica derived from Adobe's 
multi-master fonts, or whatever. The replacement font might not have 
matching metrics and certainly wouldn't be identical.

That's *not* what I'm talking about. I'm talking about the case where 
multiple embedded subsets derived from the *exact* *same* *font* exist, 
each containing partially overlapping sets of glyphs where each glyph is 
*identical* to those in the other subsets.

This is best illustrated by example. Take three input PDFs that are 
being placed as images (say, engineering diagrams, advertisments or 
breakouts in a layout, or whatever), named "1.pdf", "2.pdf" and "3.pdf" 
that will be written into "out.pdf". For the sake of this example, 
presume that content in "out.pdf"  uses "Arial Regular" for its own text 
so that font must also be embedded.

1.pdf:
        Helvetica Neue Sans subset [a cde  h]
        Utopia Black               [abcd]
2.pdf:
        Helvetica Neue Sans subset [abcde   ]
        Helvetica LT Std           [ab def  ijk]
3.pdf:
        Helvetica Neue Sans subset [  c efgh]

Desired output is:

o.pdf:
        Helvetica Neue Sans subset [abcdefgh]
        Utopia Black               [abcd]
Helvetica LT Std           [ab def  ijk]
        Arial Regular              (whatever the text in out.pdf requires)

Fop and fop-pdf-image currently produce:

1.pdf:
        Helvetica Neue Sans subset [a cde  h]
        Helvetica Neue Sans subset [abcde   ]
        Helvetica Neue Sans subset [  c efgh]
        Utopia Black               [abcd]
Helvetica LT Std           [ab def  ijk]
        Arial Regular              (whatever the text in out.pdf requires)

... meaning that there are 3 copes of h.n.s "c" plus 2 copies of "d", 
"e" and "h" from *identical* fonts (presuming each input had the same 
version of h.n.s as verified by metrics or for the truly paranoid even 
glyph data checksums). You appear to think I want to produce:

o.pdf:
        Helvetica Neue Sans        [abcdefghijk]
        Utopia Black               [abcd]
        Arial Regular              (whatever the text in out.pdf requires)

or even:

o.pdf:
        Arial Regular              (out.pdf glyph usage plus [abcdefghijk])
        Utopia Black               [abcd]

... where Helvetica Neue Sans and Helvetica LT Std are "de-duplicated" 
despite not being true duplicates of each other, or in the latter case 
both are replaced with the "equivalent" (approximately) Arial Regular.

That is *not* what I want; that would be completely incorrect to do 
automatically.

On 03/06/2012 07:08 PM, mehdi houshmand wrote:
> Font de-duping is intrinsically a post-process action, you need the
> full document, with all fonts, before you can do any font de-duping.
> PostScript does this very thing (to a much lesser extent) with the
> <optimize-resources>  tag, as a post-process action.
I absolutely disagree that font optimization must be done in a second pass.

Font de-duplication requires knowledge of all the fonts in the document, 
yes. That doesn't make it necessarily a post-process operation. PDF is a 
wonderfully non-linear format, and it's trivial to delay writing out 
fonts until the end of the document. PDF simply doesn't care where the 
fonts appear in the document. Once you know the last content stream has 
been written out (say, just before you write the xref tables) you know 
no more new glyphs will be used and no new fonts will be referenced, so 
you can write out the fonts you need.

The only operation in PDF that is (almost) forced to be post-process is 
writing out linearized  ("fast web view" or "web optimized") PDF. That's 
because web-optimized PDF must have a partial xref table and the trailer 
dictionary near the *start* of the file. It's actually still possible to 
create linearised pdf by streaming it out in a single pass, but you need 
to know more in advance about what you'll be writing out so in practice 
it's much simpler to linearise by post-processing.

> Also, the requirements aren't clear here, what is it we want here? Let
> me validate that, this shouldn't change the (I guess we can call it)
> "canonical" PDF document. By that I mean if you rasterized a PDF
> before and after this change they should be identical,
> pixel-for-pixel.
I agree.
> When Acrobat does the font de-duping (I don't
> remember how much control it gives you, but if there are levels of
> de-duping I would have chosen the most aggressive), the documents
> aren't identical.
That's because it's actually substituting fonts, replacing one font with 
another with non-identical metrics. That's not what I want to do, I want 
to *merge* overlapping subsets of fonts with identical metrics. Since 
the font dictionary gives the metric information it's practical to do 
this. If fonts don't have the same metrics, you don't de-dupe them 
because they're not duplicates.

"Optimizing" a PDF by substituting one font for another is a completely 
different and much bigger job.  Replacement of one font with another 
non-identical font is a different job that may require rewriting of 
content streams (for encoding differences), the production of multiple 
font dictionaries with different encodings to remap different content 
streams to use one font file, etc. It's hairy and complicated and I 
don't want to go there.

> There are aberrations caused by slight kerning
> differences between various verisons of Arial. This may seem trivial
> when compared to bloated PDFs, but it looks tacky and lowers the high
> standard of documents.
If the metrics don't match, they're not the same font and they don't get 
merged. The glyph metrics in the font dictionary should be sufficient to 
handle this.

Having three partial subsets of Arial in a document, each slightly 
different versions with slightly different metrics, is something I can 
live with. The problem arises when you have 10 different 
mostly-overlapping subsets of the *exact* *same* *glyph* *data* from 
each of those, leaving you with *30* small-ish copies of Arial instead 
of 3 slightly larger ones.

> The other issue is you have subset fonts created by FOP as well as
> those imported by the pdf-image-plugin. You'd have to create some
> bridge between the image loading framework and the font loading system
> *cough* HACK *cough*.
Only if you want to handle de-dupe between fop-loaded fonts and fonts 
loaded from input PDFs. I don't think that's particularly vital, but it 
might not be as bad as you think either.

The font matching and subset merging system required for pdf-image to 
de-dupe fonts would have to track glyph metrics, font names, etc for 
every font seen, and would need to accumulate information on needed 
glyphs, etc until the end of output generation just before the xref is 
written. Fop must maintain used-glyph information as it stands, and 
already knows glyph metrics, so it's entirely practical for it to report 
that into the same system. From there, it's not too much of a stretch to 
see pdf-image recognising that fop is going to embed a font with the 
same name and metrics already and just merging its required-glyph list 
with fop's before fop generates the subset.

That's a significantly bigger project, though. Just being able to merge 
completely redundant glyph subsets where the glyph data and metrics are 
exactly identical between partially overlapping subsets being loaded by 
fop-pdf-image would be a nice start.

The best thing about all this it that it's practical to do it progressively.

>   Alternatively, just thinking aloud here, if this
> was done as a post-process *wink* *wink* *wry smile*...
While it can be done in post-process, I'm really not convinced it's 
necessary. FOP handles image scaling and resampling - why don't we do 
that in post-process, too? Just generate a monstrously huge PDF full of 
uncompressed images, then re-sample later?

The answer seems to be because it's practical to do it in one pass, it's 
nicer for users, and it works well.

Why does fop have font subsetting support? Subsetting can be done in 
post-process, all you have to do is read the content streams and 
determine which glyphs are used, then rewrite the font. It's done in a 
single pass because it's *much* easier to implement that way, when fop 
already knows the glyphs it's used. Same deal: it could be done in a 
post pass, but it isn't because it doesn't make sense to do so.

Font replacement and the substitution of non-identical fonts should be 
done in post, because it's not practical to do them in a way that's 
going to be easy, reliable and automatic, nor are there any obvious 
correct choices. We don't know if the document designer wants to replace 
their own copy of Helvetica with Adobe's multi-master version. On the 
other hand, it's pretty bloody obvious that the user won't want 100 
copies each of "abcdefg...." glyphs from Helvetica LT Std that are 
*exactly* *the* *same* when they can have just one copy of each with no 
effect on document display.

> Apologies if I may seem to be argumentative here, it's not my
> intention, but I feel this is would be serious scope creep. I see the
> pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
> If you want to stitch together PDFs, PDFBox is designed just for that.

The trouble is that fop-pdf-image exists because PDFs aren't just 
images. If they were, it'd be much easier to just rasterise them and 
import them in raster form.

FWIW, I'm not trying to use fop to "stitch together PDFs" - not in the 
sense of trying to use it to append, n-up, impose, etc complex PDF 
documents. I'm using small PDFs that are basically "images" - but 
represented as a combination of raster, text and bitmap data that should 
be included in the output document as efficiently as possible and 
without loss of fidelity. IOW, exactly what fop-pdf-image is for.

--
Craig Ringer

Re: Fwd: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
<optimize-resources> tag, as a post-process action.

Also, the requirements aren't clear here, what is it we want here? Let
me validate that, this shouldn't change the (I guess we can call it)
"canonical" PDF document. By that I mean if you rasterized a PDF
before and after this change they should be identical,
pixel-for-pixel. When Acrobat does the font de-duping (I don't
remember how much control it gives you, but if there are levels of
de-duping I would have chosen the most aggressive), the documents
aren't identical. There are aberrations caused by slight kerning
differences between various verisons of Arial. This may seem trivial
when compared to bloated PDFs, but it looks tacky and lowers the high
standard of documents. You could argue this could be configurable...
But then I'd re-iterate my first argument, this is a post-process
action, not the concern of FOP or the pdf-image-plugin.

The other issue is you have subset fonts created by FOP as well as
those imported by the pdf-image-plugin. You'd have to create some
bridge between the image loading framework and the font loading system
*cough* HACK *cough*. Alternatively, just thinking aloud here, if this
was done as a post-process *wink* *wink* *wry smile*...

Apologies if I may seem to be argumentative here, it's not my
intention, but I feel this is would be serious scope creep. I see the
pdf-image-plugin as a plugin that treats PDFs as images, nothing more.
If you want to stitch together PDFs, PDFBox is designed just for that.

Mehdi

On 6 March 2012 10:36, Chris Bowditch <bo...@hotmail.com> wrote:
> On 06/03/2012 10:12, mehdi houshmand wrote:
>>
>> I fat-fingered the reply button instead of reply-to-all... *face-palm*
>
>
> Mehdi, Craig,
> <snip/>
>
>
>
>>> - Anything in the proposed XSL-FO 2.0 feature list (though most of it
>>> won't
>>> be realistic for GSoC projects);
>>>
>>> - Merge fop-pdf-image and implement smart merging of font, profile, and
>>> image resources. I'm working on this one at the moment, but slowly and
>>> only
>>> amid other projects.
>>
>> I really don't think that's a suitable project, I responded to your
>> post so maybe we could take this conversation else where, but this
>> really isn't FOPs responsibilty, or for that matter the
>> pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
>> Adobe Acrobat Pro does this kind of thing (badly may I add) as a
>> post-process action and I think that's the correct way to do it. The
>> other thing to say is that a new comer may not appreciate the
>> importance of fidelity when fonts are concerned. Basically it's too
>> difficult for a student given a few months and no previous experience.
>
> Sorry Mehdi I don't agree. I think this would be a great project. Craig
> already outlined what needs to be done and theres a lot of stuff in XGC and
> FOP as well as the plug-in. I'm not sure anything is needed in PDF-Box, but
> even if it then is an Apache project too and the student can submit patches
> there. Adobe Acrobat may make some assumptions that don't always hold true,
> but our customers are crying out for FOP to create smaller PDF files when
> importing multiple PDF images with embedded fonts. This also feels
> reasonable well defined thanks to Craig's list of TODOs and feels like it
> can be done in 3 months. It gets a +1 from me.
>
> Thanks,
>
> Chris
>
>>> --
>>> Craig Ringer
>>
>>
>

Re: Fwd: Google Summer of Code

Posted by Chris Bowditch <bo...@hotmail.com>.

On 06/03/2012 10:12, mehdi houshmand wrote:
> I fat-fingered the reply button instead of reply-to-all... *face-palm*

Mehdi, Craig,
<snip/>


>> - Anything in the proposed XSL-FO 2.0 feature list (though most of it won't
>> be realistic for GSoC projects);
>>
>> - Merge fop-pdf-image and implement smart merging of font, profile, and
>> image resources. I'm working on this one at the moment, but slowly and only
>> amid other projects.
> I really don't think that's a suitable project, I responded to your
> post so maybe we could take this conversation else where, but this
> really isn't FOPs responsibilty, or for that matter the
> pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
> Adobe Acrobat Pro does this kind of thing (badly may I add) as a
> post-process action and I think that's the correct way to do it. The
> other thing to say is that a new comer may not appreciate the
> importance of fidelity when fonts are concerned. Basically it's too
> difficult for a student given a few months and no previous experience.
Sorry Mehdi I don't agree. I think this would be a great project. Craig 
already outlined what needs to be done and theres a lot of stuff in XGC 
and FOP as well as the plug-in. I'm not sure anything is needed in 
PDF-Box, but even if it then is an Apache project too and the student 
can submit patches there. Adobe Acrobat may make some assumptions that 
don't always hold true, but our customers are crying out for FOP to 
create smaller PDF files when importing multiple PDF images with 
embedded fonts. This also feels reasonable well defined thanks to 
Craig's list of TODOs and feels like it can be done in 3 months. It gets 
a +1 from me.

Thanks,

Chris

>> --
>> Craig Ringer
>

Fwd: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

I fat-fingered the reply button instead of reply-to-all... *face-palm*

---------- Forwarded message ----------
From: mehdi houshmand <me...@gmail.com>
Date: 6 March 2012 09:33
Subject: Re: Google Summer of Code
To: Craig Ringer <ri...@ringerc.id.au>

On 6 March 2012 00:16, Craig Ringer <ri...@ringerc.id.au> wrote:
> On 03/05/2012 09:35 PM, mehdi houshmand wrote:
>>
>> Because of the overwhelming popularity of this idea, I've created a
>> link on the Wiki
>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>> the GSoC proposals.
>>
> Things that come to mind for me:
>
> - PDFBox backend (probably ideal for GSoC, nice and self contained, great
> for someone who knows PDFBox and wants to learn fop's codebase);
>
> - CID fonts in PostScript (good for someone who knows PS and fonts, not
> necessarily XSL-FO so much);

There is already a big body of work that does this, check the
TrueTypeInPostScript branch as well as the patch
https://issues.apache.org/bugzilla/show_bug.cgi?id=50483. This stuff
needs to be merged into trunk and we do have that on our agenda,
but... I don't make the rules.

>
> - Using automatic +- kerning, +- tracking *and* +- horizontal type scaling
> adjustment to better auto-fit text, involving support for font-stretch
> property. This touches on layout so it may not be practical for a 1st fop
> project, but may not be too bad since fop already adjusts tracking when
> justifying text. The key interest points would be *negative* tracking,
> kerning and (if nothing else works) glyph-scaling for tighter type-fitting
> where it's not desirable to break to a new line due to widow/orphan policy
> or because it'd create large holes. This is particularly important when long
> unbreakable words must fit a fixed width space.

This sounds pretty interesting!! Could you put this and maybe a little
more information in a proposal similar to
https://issues.apache.org/jira/browse/COMDEV-66 or
https://issues.apache.org/jira/browse/COMDEV-67 and I'll create a JIRA
issue.

>
> - PDF/X-1a with CMYK;

I have no idea what is involved here, sounds like a lot of time in the
spec and battling FOP, but as I said, those are baseless assumptions.
Is that an interesting project?

>
> - Anything in the proposed XSL-FO 2.0 feature list (though most of it won't
> be realistic for GSoC projects);
>
> - Merge fop-pdf-image and implement smart merging of font, profile, and
> image resources. I'm working on this one at the moment, but slowly and only
> amid other projects.

I really don't think that's a suitable project, I responded to your
post so maybe we could take this conversation else where, but this
really isn't FOPs responsibilty, or for that matter the
pdf-image-plugin. If anything, I'd argue that's a PDFBox project,
Adobe Acrobat Pro does this kind of thing (badly may I add) as a
post-process action and I think that's the correct way to do it. The
other thing to say is that a new comer may not appreciate the
importance of fidelity when fonts are concerned. Basically it's too
difficult for a student given a few months and no previous experience.

>
> --
> Craig Ringer

Re: Google Summer of Code

Posted by Craig Ringer <ri...@ringerc.id.au>.

On 03/05/2012 09:35 PM, mehdi houshmand wrote:
> Because of the overwhelming popularity of this idea, I've created a
> link on the Wiki
> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
> the GSoC proposals.
>
Things that come to mind for me:

- PDFBox backend (probably ideal for GSoC, nice and self contained, 
great for someone who knows PDFBox and wants to learn fop's codebase);

- CID fonts in PostScript (good for someone who knows PS and fonts, not 
necessarily XSL-FO so much);

- Using automatic +- kerning, +- tracking *and* +- horizontal type 
scaling adjustment to better auto-fit text, involving support for 
font-stretch property. This touches on layout so it may not be practical 
for a 1st fop project, but may not be too bad since fop already adjusts 
tracking when justifying text. The key interest points would be 
*negative* tracking, kerning and (if nothing else works) glyph-scaling 
for tighter type-fitting where it's not desirable to break to a new line 
due to widow/orphan policy or because it'd create large holes. This is 
particularly important when long unbreakable words must fit a fixed 
width space.

- PDF/X-1a with CMYK;

- Anything in the proposed XSL-FO 2.0 feature list (though most of it 
won't be realistic for GSoC projects);

- Merge fop-pdf-image and implement smart merging of font, profile, and 
image resources. I'm working on this one at the moment, but slowly and 
only amid other projects.

--
Craig Ringer

Re: Google Summer of Code

Posted by Glenn Adams <gl...@skynav.com>.

On Mon, Mar 5, 2012 at 9:32 AM, Chris Bowditch
<bo...@hotmail.com>wrote:

> On 05/03/2012 16:18, mehdi houshmand wrote:
>
>> I agree that there may be some good project ideas in the bug list, but I
>> don't think the one Alex highlighted is a good one. Changing the layout
>> algorithm is a major undertaking, probably several man years :) We need to
>> find something small and well defined for a GSoC project, something that we
>> know can be completed in 2-3 months.
>
>
agreed, that one is not a good one for a summer project, but there are
likely a number of other existing bugs that may be

Re: Google Summer of Code

Posted by Chris Bowditch <bo...@hotmail.com>.

On 05/03/2012 16:18, mehdi houshmand wrote:
> Hi Alex/Glenn,

Hi guys,

I agree that there may be some good project ideas in the bug list, but I 
don't think the one Alex highlighted is a good one. Changing the layout 
algorithm is a major undertaking, probably several man years :) We need 
to find something small and well defined for a GSoC project, something 
that we know can be completed in 2-3 months.

Thanks,

Chris

>
> Yeah that's a fair point, I think this may be a textbook case of
> Freudian projection, so my apologies if those weren't your intentions
> Glenn.
>
> The problem is, I don't have a great deal of experience in the Layout
> Engine and I really have no grounds to put a proposal together. I've
> put forward the projects that I know about and think are interesting.
> If you want to put a project proposal forward please do, if no one
> else steps forward as a mentor and an applicant takes an interest,
> I'll make the effort to learn the code.
>
> Mehdi
>
>
> On 5 March 2012 15:48, Alexios Giotis<al...@gmail.com>  wrote:
>> I don't think that Glenn's idea is that bad. FOP's open bugzilla issues are not only bugs, they also show what are the areas that FOP needs to be improved. If we start from the beginning, then
>>
>> | 1063|New|Nor|2001-03-21|fop does not handle large fo files
>>
>> is a real, very interesting issue and the solution is not to increase the Java heap size. There are workarounds such as caching objects but a good solution might be deeper in FOP's  layout engine. What about checking or implementing Donald Knuth's first-fit or best-fit algorithms ? In theory, it would allow to free FO tree and layout manager objects after the end of every page.
>>
>> There was a recent discussion about this, see
>> http://apache.markmail.org/message/3ejv4opwcceipfpl?q=list:org%2Eapache%2Exmlgraphics%2Efop-users+total+best+fit
>>
>> Of course there will be drawbacks, FOP is complex (more complex than it should be in my opinion, cleanup / modularization would help) and this is not a simple task.
>>
>>
>> Alex Giotis
>>
>>
>> On Mar 5, 2012, at 4:49 PM, mehdi houshmand wrote:
>>
>>> Haha, if only it were that simple... The projects have to be
>>> interesting and fulfilling and at least bordering on fun. They also
>>> have to be an opportunity to learn and encourage opensource
>>> development. There's little fun to be had fixing bugs hidden in the
>>> depths of FOPs fairly difficult to delve-in code base, also - probably
>>> more importantly - I can't imagine it would serve as encouragement.
>>>
>>> Mehdi
>>>
>>> On 5 March 2012 14:36, Glenn Adams<gl...@skynav.com>  wrote:
>>>> I would suggest whittling down the fop bug list, starting from the
>>>> beginning.
>>>>
>>>>
>>>> On Mon, Mar 5, 2012 at 6:35 AM, mehdi houshmand<me...@gmail.com>  wrote:
>>>>> Because of the overwhelming popularity of this idea, I've created a
>>>>> link on the Wiki
>>>>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>>>>> the GSoC proposals.
>>>>>
>>>>> On a serious note, this is literally work for free. Google pays the
>>>>> bills and I'm happy to mentor any applicants and do the admin, all you
>>>>> have to do is provide ideas for projects. If you have a wish list or a
>>>>> list of TODOs that you think a newbie could do for a summer project (I
>>>>> do appreciate that's quite a big caveat), now's your opportunity.
>>>>>
>>>>> Mehdi
>>>>>
>>>>> On 1 March 2012 16:26, mehdi houshmand<me...@gmail.com>  wrote:
>>>>>> Hi Glenn,
>>>>>>
>>>>>> The GSoC doesn't relate directly to the ASF or FOP directly, however,
>>>>>> putting a few FOP projects as proposals would be a good way to get
>>>>>> some new interest into the project. I think it would be good for us as
>>>>>> we benefit from any work done, and it helps whomever does the work
>>>>>> learn the various skills that we as a community can impart upon them.
>>>>>>
>>>>>> I've included a link to the GSoC below, but if you do some research,
>>>>>> there's plenty of information out there.
>>>>>>
>>>>>> http://code.google.com/soc/
>>>>>>
>>>>>> Mehdi
>>>>>>
>>>>>> On 1 March 2012 16:13, Glenn Adams<gl...@skynav.com>  wrote:
>>>>>>> could you provide a link to the "Google Summer of Code Project"? how
>>>>>>> does it
>>>>>>> relate to ASF and FOP activities?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand<me...@gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We're thinking of submitting a proposal or two to the Google Summer of
>>>>>>>> Code project and wanted to get some input from the community on ideas.
>>>>>>>> Once we've got a few proposals I'll create a wiki page and put all the
>>>>>>>> ideas on there, but for now I just wanted to gauge interest.
>>>>>>>>
>>>>>>>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>>>>>>>> one and if any other committers fancy the job, do register, the more
>>>>>>>> the merrier. The deadline is 9th March, so that doesn't give us long
>>>>>>>> to bounce around ideas, but here are a few I was thinking:
>>>>>>>>
>>>>>>>> - There have been recent discussions between Jeremias, myself and
>>>>>>>> others about extracting the Fonts packages into their own library. I
>>>>>>>> think this would be a great idea for a project because essentially it
>>>>>>>> only involves a few, well defined specifications (TTF, Type1 etc) and
>>>>>>>> doesn't expose the person to too much complexity. The way I'd suggest
>>>>>>>> this to be done, is by re-writing rather than porting, that way it
>>>>>>>> gives the person much more flexibility and also the current code would
>>>>>>>> give them good tips and tricks on how to deal with parsing fonts.
>>>>>>>>
>>>>>>>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>>>>>>>> flying around, and however much I'd like to fob that onto someone
>>>>>>>> else, I don't think it's fair to do so. I have no idea how long this
>>>>>>>> project would take, but I think FOP could really benefit from it.
>>>>>>>> Currently we're forcing users to use AFP fonts for AFP documents, a
>>>>>>>> lot of which are archaic and use EBCDIC, for those of you who haven't
>>>>>>>> been exposed to EBCDIC, count yourself lucky.
>>>>>>>>
>>>>>>>> There may be something to do with PCL?? I'm not at all familiar with
>>>>>>>> the format, but I do remember discussions about upgrading to a newer
>>>>>>>> PCL standard? I'd be happy to acquaint myself with the format if
>>>>>>>> there's interest in the idea.
>>>>>>>>
>>>>>>>> Hopefully we can get a proposal together in time.
>>>>>>>>
>>>>>>>> Mehdi
>>>>>>>
>>>>
>

Re: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

Hi Alex/Glenn,

Yeah that's a fair point, I think this may be a textbook case of
Freudian projection, so my apologies if those weren't your intentions
Glenn.

The problem is, I don't have a great deal of experience in the Layout
Engine and I really have no grounds to put a proposal together. I've
put forward the projects that I know about and think are interesting.
If you want to put a project proposal forward please do, if no one
else steps forward as a mentor and an applicant takes an interest,
I'll make the effort to learn the code.

Mehdi


On 5 March 2012 15:48, Alexios Giotis <al...@gmail.com> wrote:
> I don't think that Glenn's idea is that bad. FOP's open bugzilla issues are not only bugs, they also show what are the areas that FOP needs to be improved. If we start from the beginning, then
>
> | 1063|New|Nor|2001-03-21|fop does not handle large fo files
>
> is a real, very interesting issue and the solution is not to increase the Java heap size. There are workarounds such as caching objects but a good solution might be deeper in FOP's  layout engine. What about checking or implementing Donald Knuth's first-fit or best-fit algorithms ? In theory, it would allow to free FO tree and layout manager objects after the end of every page.
>
> There was a recent discussion about this, see
> http://apache.markmail.org/message/3ejv4opwcceipfpl?q=list:org%2Eapache%2Exmlgraphics%2Efop-users+total+best+fit
>
> Of course there will be drawbacks, FOP is complex (more complex than it should be in my opinion, cleanup / modularization would help) and this is not a simple task.
>
>
> Alex Giotis
>
>
> On Mar 5, 2012, at 4:49 PM, mehdi houshmand wrote:
>
>> Haha, if only it were that simple... The projects have to be
>> interesting and fulfilling and at least bordering on fun. They also
>> have to be an opportunity to learn and encourage opensource
>> development. There's little fun to be had fixing bugs hidden in the
>> depths of FOPs fairly difficult to delve-in code base, also - probably
>> more importantly - I can't imagine it would serve as encouragement.
>>
>> Mehdi
>>
>> On 5 March 2012 14:36, Glenn Adams <gl...@skynav.com> wrote:
>>> I would suggest whittling down the fop bug list, starting from the
>>> beginning.
>>>
>>>
>>> On Mon, Mar 5, 2012 at 6:35 AM, mehdi houshmand <me...@gmail.com> wrote:
>>>>
>>>> Because of the overwhelming popularity of this idea, I've created a
>>>> link on the Wiki
>>>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>>>> the GSoC proposals.
>>>>
>>>> On a serious note, this is literally work for free. Google pays the
>>>> bills and I'm happy to mentor any applicants and do the admin, all you
>>>> have to do is provide ideas for projects. If you have a wish list or a
>>>> list of TODOs that you think a newbie could do for a summer project (I
>>>> do appreciate that's quite a big caveat), now's your opportunity.
>>>>
>>>> Mehdi
>>>>
>>>> On 1 March 2012 16:26, mehdi houshmand <me...@gmail.com> wrote:
>>>>> Hi Glenn,
>>>>>
>>>>> The GSoC doesn't relate directly to the ASF or FOP directly, however,
>>>>> putting a few FOP projects as proposals would be a good way to get
>>>>> some new interest into the project. I think it would be good for us as
>>>>> we benefit from any work done, and it helps whomever does the work
>>>>> learn the various skills that we as a community can impart upon them.
>>>>>
>>>>> I've included a link to the GSoC below, but if you do some research,
>>>>> there's plenty of information out there.
>>>>>
>>>>> http://code.google.com/soc/
>>>>>
>>>>> Mehdi
>>>>>
>>>>> On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
>>>>>> could you provide a link to the "Google Summer of Code Project"? how
>>>>>> does it
>>>>>> relate to ASF and FOP activities?
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We're thinking of submitting a proposal or two to the Google Summer of
>>>>>>> Code project and wanted to get some input from the community on ideas.
>>>>>>> Once we've got a few proposals I'll create a wiki page and put all the
>>>>>>> ideas on there, but for now I just wanted to gauge interest.
>>>>>>>
>>>>>>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>>>>>>> one and if any other committers fancy the job, do register, the more
>>>>>>> the merrier. The deadline is 9th March, so that doesn't give us long
>>>>>>> to bounce around ideas, but here are a few I was thinking:
>>>>>>>
>>>>>>> - There have been recent discussions between Jeremias, myself and
>>>>>>> others about extracting the Fonts packages into their own library. I
>>>>>>> think this would be a great idea for a project because essentially it
>>>>>>> only involves a few, well defined specifications (TTF, Type1 etc) and
>>>>>>> doesn't expose the person to too much complexity. The way I'd suggest
>>>>>>> this to be done, is by re-writing rather than porting, that way it
>>>>>>> gives the person much more flexibility and also the current code would
>>>>>>> give them good tips and tricks on how to deal with parsing fonts.
>>>>>>>
>>>>>>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>>>>>>> flying around, and however much I'd like to fob that onto someone
>>>>>>> else, I don't think it's fair to do so. I have no idea how long this
>>>>>>> project would take, but I think FOP could really benefit from it.
>>>>>>> Currently we're forcing users to use AFP fonts for AFP documents, a
>>>>>>> lot of which are archaic and use EBCDIC, for those of you who haven't
>>>>>>> been exposed to EBCDIC, count yourself lucky.
>>>>>>>
>>>>>>> There may be something to do with PCL?? I'm not at all familiar with
>>>>>>> the format, but I do remember discussions about upgrading to a newer
>>>>>>> PCL standard? I'd be happy to acquaint myself with the format if
>>>>>>> there's interest in the idea.
>>>>>>>
>>>>>>> Hopefully we can get a proposal together in time.
>>>>>>>
>>>>>>> Mehdi
>>>>>>
>>>>>>
>>>
>>>
>

Re: Google Summer of Code

Posted by Alexios Giotis <al...@gmail.com>.

I don't think that Glenn's idea is that bad. FOP's open bugzilla issues are not only bugs, they also show what are the areas that FOP needs to be improved. If we start from the beginning, then 

| 1063|New|Nor|2001-03-21|fop does not handle large fo files   

is a real, very interesting issue and the solution is not to increase the Java heap size. There are workarounds such as caching objects but a good solution might be deeper in FOP's  layout engine. What about checking or implementing Donald Knuth's first-fit or best-fit algorithms ? In theory, it would allow to free FO tree and layout manager objects after the end of every page. 

There was a recent discussion about this, see
http://apache.markmail.org/message/3ejv4opwcceipfpl?q=list:org%2Eapache%2Exmlgraphics%2Efop-users+total+best+fit

Of course there will be drawbacks, FOP is complex (more complex than it should be in my opinion, cleanup / modularization would help) and this is not a simple task.


Alex Giotis


On Mar 5, 2012, at 4:49 PM, mehdi houshmand wrote:

> Haha, if only it were that simple... The projects have to be
> interesting and fulfilling and at least bordering on fun. They also
> have to be an opportunity to learn and encourage opensource
> development. There's little fun to be had fixing bugs hidden in the
> depths of FOPs fairly difficult to delve-in code base, also - probably
> more importantly - I can't imagine it would serve as encouragement.
> 
> Mehdi
> 
> On 5 March 2012 14:36, Glenn Adams <gl...@skynav.com> wrote:
>> I would suggest whittling down the fop bug list, starting from the
>> beginning.
>> 
>> 
>> On Mon, Mar 5, 2012 at 6:35 AM, mehdi houshmand <me...@gmail.com> wrote:
>>> 
>>> Because of the overwhelming popularity of this idea, I've created a
>>> link on the Wiki
>>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>>> the GSoC proposals.
>>> 
>>> On a serious note, this is literally work for free. Google pays the
>>> bills and I'm happy to mentor any applicants and do the admin, all you
>>> have to do is provide ideas for projects. If you have a wish list or a
>>> list of TODOs that you think a newbie could do for a summer project (I
>>> do appreciate that's quite a big caveat), now's your opportunity.
>>> 
>>> Mehdi
>>> 
>>> On 1 March 2012 16:26, mehdi houshmand <me...@gmail.com> wrote:
>>>> Hi Glenn,
>>>> 
>>>> The GSoC doesn't relate directly to the ASF or FOP directly, however,
>>>> putting a few FOP projects as proposals would be a good way to get
>>>> some new interest into the project. I think it would be good for us as
>>>> we benefit from any work done, and it helps whomever does the work
>>>> learn the various skills that we as a community can impart upon them.
>>>> 
>>>> I've included a link to the GSoC below, but if you do some research,
>>>> there's plenty of information out there.
>>>> 
>>>> http://code.google.com/soc/
>>>> 
>>>> Mehdi
>>>> 
>>>> On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
>>>>> could you provide a link to the "Google Summer of Code Project"? how
>>>>> does it
>>>>> relate to ASF and FOP activities?
>>>>> 
>>>>> 
>>>>> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> We're thinking of submitting a proposal or two to the Google Summer of
>>>>>> Code project and wanted to get some input from the community on ideas.
>>>>>> Once we've got a few proposals I'll create a wiki page and put all the
>>>>>> ideas on there, but for now I just wanted to gauge interest.
>>>>>> 
>>>>>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>>>>>> one and if any other committers fancy the job, do register, the more
>>>>>> the merrier. The deadline is 9th March, so that doesn't give us long
>>>>>> to bounce around ideas, but here are a few I was thinking:
>>>>>> 
>>>>>> - There have been recent discussions between Jeremias, myself and
>>>>>> others about extracting the Fonts packages into their own library. I
>>>>>> think this would be a great idea for a project because essentially it
>>>>>> only involves a few, well defined specifications (TTF, Type1 etc) and
>>>>>> doesn't expose the person to too much complexity. The way I'd suggest
>>>>>> this to be done, is by re-writing rather than porting, that way it
>>>>>> gives the person much more flexibility and also the current code would
>>>>>> give them good tips and tricks on how to deal with parsing fonts.
>>>>>> 
>>>>>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>>>>>> flying around, and however much I'd like to fob that onto someone
>>>>>> else, I don't think it's fair to do so. I have no idea how long this
>>>>>> project would take, but I think FOP could really benefit from it.
>>>>>> Currently we're forcing users to use AFP fonts for AFP documents, a
>>>>>> lot of which are archaic and use EBCDIC, for those of you who haven't
>>>>>> been exposed to EBCDIC, count yourself lucky.
>>>>>> 
>>>>>> There may be something to do with PCL?? I'm not at all familiar with
>>>>>> the format, but I do remember discussions about upgrading to a newer
>>>>>> PCL standard? I'd be happy to acquaint myself with the format if
>>>>>> there's interest in the idea.
>>>>>> 
>>>>>> Hopefully we can get a proposal together in time.
>>>>>> 
>>>>>> Mehdi
>>>>> 
>>>>> 
>> 
>>

Re: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

Haha, if only it were that simple... The projects have to be
interesting and fulfilling and at least bordering on fun. They also
have to be an opportunity to learn and encourage opensource
development. There's little fun to be had fixing bugs hidden in the
depths of FOPs fairly difficult to delve-in code base, also - probably
more importantly - I can't imagine it would serve as encouragement.

Mehdi

On 5 March 2012 14:36, Glenn Adams <gl...@skynav.com> wrote:
> I would suggest whittling down the fop bug list, starting from the
> beginning.
>
>
> On Mon, Mar 5, 2012 at 6:35 AM, mehdi houshmand <me...@gmail.com> wrote:
>>
>> Because of the overwhelming popularity of this idea, I've created a
>> link on the Wiki
>> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
>> the GSoC proposals.
>>
>> On a serious note, this is literally work for free. Google pays the
>> bills and I'm happy to mentor any applicants and do the admin, all you
>> have to do is provide ideas for projects. If you have a wish list or a
>> list of TODOs that you think a newbie could do for a summer project (I
>> do appreciate that's quite a big caveat), now's your opportunity.
>>
>> Mehdi
>>
>> On 1 March 2012 16:26, mehdi houshmand <me...@gmail.com> wrote:
>> > Hi Glenn,
>> >
>> > The GSoC doesn't relate directly to the ASF or FOP directly, however,
>> > putting a few FOP projects as proposals would be a good way to get
>> > some new interest into the project. I think it would be good for us as
>> > we benefit from any work done, and it helps whomever does the work
>> > learn the various skills that we as a community can impart upon them.
>> >
>> > I've included a link to the GSoC below, but if you do some research,
>> > there's plenty of information out there.
>> >
>> > http://code.google.com/soc/
>> >
>> > Mehdi
>> >
>> > On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
>> >> could you provide a link to the "Google Summer of Code Project"? how
>> >> does it
>> >> relate to ASF and FOP activities?
>> >>
>> >>
>> >> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> We're thinking of submitting a proposal or two to the Google Summer of
>> >>> Code project and wanted to get some input from the community on ideas.
>> >>> Once we've got a few proposals I'll create a wiki page and put all the
>> >>> ideas on there, but for now I just wanted to gauge interest.
>> >>>
>> >>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>> >>> one and if any other committers fancy the job, do register, the more
>> >>> the merrier. The deadline is 9th March, so that doesn't give us long
>> >>> to bounce around ideas, but here are a few I was thinking:
>> >>>
>> >>> - There have been recent discussions between Jeremias, myself and
>> >>> others about extracting the Fonts packages into their own library. I
>> >>> think this would be a great idea for a project because essentially it
>> >>> only involves a few, well defined specifications (TTF, Type1 etc) and
>> >>> doesn't expose the person to too much complexity. The way I'd suggest
>> >>> this to be done, is by re-writing rather than porting, that way it
>> >>> gives the person much more flexibility and also the current code would
>> >>> give them good tips and tricks on how to deal with parsing fonts.
>> >>>
>> >>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>> >>> flying around, and however much I'd like to fob that onto someone
>> >>> else, I don't think it's fair to do so. I have no idea how long this
>> >>> project would take, but I think FOP could really benefit from it.
>> >>> Currently we're forcing users to use AFP fonts for AFP documents, a
>> >>> lot of which are archaic and use EBCDIC, for those of you who haven't
>> >>> been exposed to EBCDIC, count yourself lucky.
>> >>>
>> >>> There may be something to do with PCL?? I'm not at all familiar with
>> >>> the format, but I do remember discussions about upgrading to a newer
>> >>> PCL standard? I'd be happy to acquaint myself with the format if
>> >>> there's interest in the idea.
>> >>>
>> >>> Hopefully we can get a proposal together in time.
>> >>>
>> >>> Mehdi
>> >>
>> >>
>
>

Re: Google Summer of Code

Posted by Glenn Adams <gl...@skynav.com>.

I would suggest whittling down the fop bug list, starting from the
beginning.

On Mon, Mar 5, 2012 at 6:35 AM, mehdi houshmand <me...@gmail.com> wrote:

> Because of the overwhelming popularity of this idea, I've created a
> link on the Wiki
> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
> the GSoC proposals.
>
> On a serious note, this is literally work for free. Google pays the
> bills and I'm happy to mentor any applicants and do the admin, all you
> have to do is provide ideas for projects. If you have a wish list or a
> list of TODOs that you think a newbie could do for a summer project (I
> do appreciate that's quite a big caveat), now's your opportunity.
>
> Mehdi
>
> On 1 March 2012 16:26, mehdi houshmand <me...@gmail.com> wrote:
> > Hi Glenn,
> >
> > The GSoC doesn't relate directly to the ASF or FOP directly, however,
> > putting a few FOP projects as proposals would be a good way to get
> > some new interest into the project. I think it would be good for us as
> > we benefit from any work done, and it helps whomever does the work
> > learn the various skills that we as a community can impart upon them.
> >
> > I've included a link to the GSoC below, but if you do some research,
> > there's plenty of information out there.
> >
> > http://code.google.com/soc/
> >
> > Mehdi
> >
> > On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
> >> could you provide a link to the "Google Summer of Code Project"? how
> does it
> >> relate to ASF and FOP activities?
> >>
> >>
> >> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We're thinking of submitting a proposal or two to the Google Summer of
> >>> Code project and wanted to get some input from the community on ideas.
> >>> Once we've got a few proposals I'll create a wiki page and put all the
> >>> ideas on there, but for now I just wanted to gauge interest.
> >>>
> >>> In terms of mentoring, I'm happy to be a mentor and I've registered as
> >>> one and if any other committers fancy the job, do register, the more
> >>> the merrier. The deadline is 9th March, so that doesn't give us long
> >>> to bounce around ideas, but here are a few I was thinking:
> >>>
> >>> - There have been recent discussions between Jeremias, myself and
> >>> others about extracting the Fonts packages into their own library. I
> >>> think this would be a great idea for a project because essentially it
> >>> only involves a few, well defined specifications (TTF, Type1 etc) and
> >>> doesn't expose the person to too much complexity. The way I'd suggest
> >>> this to be done, is by re-writing rather than porting, that way it
> >>> gives the person much more flexibility and also the current code would
> >>> give them good tips and tricks on how to deal with parsing fonts.
> >>>
> >>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
> >>> flying around, and however much I'd like to fob that onto someone
> >>> else, I don't think it's fair to do so. I have no idea how long this
> >>> project would take, but I think FOP could really benefit from it.
> >>> Currently we're forcing users to use AFP fonts for AFP documents, a
> >>> lot of which are archaic and use EBCDIC, for those of you who haven't
> >>> been exposed to EBCDIC, count yourself lucky.
> >>>
> >>> There may be something to do with PCL?? I'm not at all familiar with
> >>> the format, but I do remember discussions about upgrading to a newer
> >>> PCL standard? I'd be happy to acquaint myself with the format if
> >>> there's interest in the idea.
> >>>
> >>> Hopefully we can get a proposal together in time.
> >>>
> >>> Mehdi
> >>
> >>
>

Re: Google Summer of Code

Posted by Craig Ringer <ri...@ringerc.id.au>.

On 03/05/2012 09:35 PM, mehdi houshmand wrote:
> Because of the overwhelming popularity of this idea, I've created a
> link on the Wiki
> (http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
> the GSoC proposals.
>

You note font library extraction as a possibility there. I'd like to 
note another possible motivation for extracting the font library: to 
then potentially permit it to be merged with or replaced by pdfbox's 
fontbox, reducing duplicate work.

--
Craig Ringer

Re: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

Because of the overwhelming popularity of this idea, I've created a
link on the Wiki
(http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
the GSoC proposals.

On a serious note, this is literally work for free. Google pays the
bills and I'm happy to mentor any applicants and do the admin, all you
have to do is provide ideas for projects. If you have a wish list or a
list of TODOs that you think a newbie could do for a summer project (I
do appreciate that's quite a big caveat), now's your opportunity.

Mehdi

On 1 March 2012 16:26, mehdi houshmand <me...@gmail.com> wrote:
> Hi Glenn,
>
> The GSoC doesn't relate directly to the ASF or FOP directly, however,
> putting a few FOP projects as proposals would be a good way to get
> some new interest into the project. I think it would be good for us as
> we benefit from any work done, and it helps whomever does the work
> learn the various skills that we as a community can impart upon them.
>
> I've included a link to the GSoC below, but if you do some research,
> there's plenty of information out there.
>
> http://code.google.com/soc/
>
> Mehdi
>
> On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
>> could you provide a link to the "Google Summer of Code Project"? how does it
>> relate to ASF and FOP activities?
>>
>>
>> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We're thinking of submitting a proposal or two to the Google Summer of
>>> Code project and wanted to get some input from the community on ideas.
>>> Once we've got a few proposals I'll create a wiki page and put all the
>>> ideas on there, but for now I just wanted to gauge interest.
>>>
>>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>>> one and if any other committers fancy the job, do register, the more
>>> the merrier. The deadline is 9th March, so that doesn't give us long
>>> to bounce around ideas, but here are a few I was thinking:
>>>
>>> - There have been recent discussions between Jeremias, myself and
>>> others about extracting the Fonts packages into their own library. I
>>> think this would be a great idea for a project because essentially it
>>> only involves a few, well defined specifications (TTF, Type1 etc) and
>>> doesn't expose the person to too much complexity. The way I'd suggest
>>> this to be done, is by re-writing rather than porting, that way it
>>> gives the person much more flexibility and also the current code would
>>> give them good tips and tricks on how to deal with parsing fonts.
>>>
>>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>>> flying around, and however much I'd like to fob that onto someone
>>> else, I don't think it's fair to do so. I have no idea how long this
>>> project would take, but I think FOP could really benefit from it.
>>> Currently we're forcing users to use AFP fonts for AFP documents, a
>>> lot of which are archaic and use EBCDIC, for those of you who haven't
>>> been exposed to EBCDIC, count yourself lucky.
>>>
>>> There may be something to do with PCL?? I'm not at all familiar with
>>> the format, but I do remember discussions about upgrading to a newer
>>> PCL standard? I'd be happy to acquaint myself with the format if
>>> there's interest in the idea.
>>>
>>> Hopefully we can get a proposal together in time.
>>>
>>> Mehdi
>>
>>

Re: Google Summer of Code

Posted by mehdi houshmand <me...@gmail.com>.

Hi Glenn,

The GSoC doesn't relate directly to the ASF or FOP directly, however,
putting a few FOP projects as proposals would be a good way to get
some new interest into the project. I think it would be good for us as
we benefit from any work done, and it helps whomever does the work
learn the various skills that we as a community can impart upon them.

I've included a link to the GSoC below, but if you do some research,
there's plenty of information out there.

http://code.google.com/soc/

Mehdi

On 1 March 2012 16:13, Glenn Adams <gl...@skynav.com> wrote:
> could you provide a link to the "Google Summer of Code Project"? how does it
> relate to ASF and FOP activities?
>
>
> On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com> wrote:
>>
>> Hi,
>>
>> We're thinking of submitting a proposal or two to the Google Summer of
>> Code project and wanted to get some input from the community on ideas.
>> Once we've got a few proposals I'll create a wiki page and put all the
>> ideas on there, but for now I just wanted to gauge interest.
>>
>> In terms of mentoring, I'm happy to be a mentor and I've registered as
>> one and if any other committers fancy the job, do register, the more
>> the merrier. The deadline is 9th March, so that doesn't give us long
>> to bounce around ideas, but here are a few I was thinking:
>>
>> - There have been recent discussions between Jeremias, myself and
>> others about extracting the Fonts packages into their own library. I
>> think this would be a great idea for a project because essentially it
>> only involves a few, well defined specifications (TTF, Type1 etc) and
>> doesn't expose the person to too much complexity. The way I'd suggest
>> this to be done, is by re-writing rather than porting, that way it
>> gives the person much more flexibility and also the current code would
>> give them good tips and tricks on how to deal with parsing fonts.
>>
>> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
>> flying around, and however much I'd like to fob that onto someone
>> else, I don't think it's fair to do so. I have no idea how long this
>> project would take, but I think FOP could really benefit from it.
>> Currently we're forcing users to use AFP fonts for AFP documents, a
>> lot of which are archaic and use EBCDIC, for those of you who haven't
>> been exposed to EBCDIC, count yourself lucky.
>>
>> There may be something to do with PCL?? I'm not at all familiar with
>> the format, but I do remember discussions about upgrading to a newer
>> PCL standard? I'd be happy to acquaint myself with the format if
>> there's interest in the idea.
>>
>> Hopefully we can get a proposal together in time.
>>
>> Mehdi
>
>

Re: Google Summer of Code

Posted by Glenn Adams <gl...@skynav.com>.

could you provide a link to the "Google Summer of Code Project"? how does
it relate to ASF and FOP activities?

On Thu, Mar 1, 2012 at 3:50 AM, mehdi houshmand <me...@gmail.com> wrote:

> Hi,
>
> We're thinking of submitting a proposal or two to the Google Summer of
> Code project and wanted to get some input from the community on ideas.
> Once we've got a few proposals I'll create a wiki page and put all the
> ideas on there, but for now I just wanted to gauge interest.
>
> In terms of mentoring, I'm happy to be a mentor and I've registered as
> one and if any other committers fancy the job, do register, the more
> the merrier. The deadline is 9th March, so that doesn't give us long
> to bounce around ideas, but here are a few I was thinking:
>
> - There have been recent discussions between Jeremias, myself and
> others about extracting the Fonts packages into their own library. I
> think this would be a great idea for a project because essentially it
> only involves a few, well defined specifications (TTF, Type1 etc) and
> doesn't expose the person to too much complexity. The way I'd suggest
> this to be done, is by re-writing rather than porting, that way it
> gives the person much more flexibility and also the current code would
> give them good tips and tricks on how to deal with parsing fonts.
>
> - TTF in AFP. I know we still have the TrueTypeInPostScript branch
> flying around, and however much I'd like to fob that onto someone
> else, I don't think it's fair to do so. I have no idea how long this
> project would take, but I think FOP could really benefit from it.
> Currently we're forcing users to use AFP fonts for AFP documents, a
> lot of which are archaic and use EBCDIC, for those of you who haven't
> been exposed to EBCDIC, count yourself lucky.
>
> There may be something to do with PCL?? I'm not at all familiar with
> the format, but I do remember discussions about upgrading to a newer
> PCL standard? I'd be happy to acquaint myself with the format if
> there's interest in the idea.
>
> Hopefully we can get a proposal together in time.
>
> Mehdi
>