You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-users@xmlgraphics.apache.org by "dvineshkumar@gmail.com" <dv...@gmail.com> on 2015/08/11 16:36:51 UTC

Re: FOP2.0 taking more time format complex script documents

Hi,

After analysis, found a bug in MultiByteFont::findGlyphIndex() method. 
In FOP2.0, MultiByteFont::findGlyphIndex() method, for loop is continous
even after a glyph character is found. Updated the findGlyphIndex() method
to terminate the loop once the glyph character is found and performance got
improved much. Refer below existing and updated method.

Existing:

 public int findGlyphIndex(int c) {
        int idx = c;
        int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;

        // for most users the most likely glyphs are in the first cmap
segments (meaning the one with
        // the lowest unicode start values)
        if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
            return mostLikelyGlyphs[idx];
        }
        for (CMapSegment i : cmap) {
            if (retIdx == 0
                    && i.getUnicodeStart() <= idx
                    && i.getUnicodeEnd() >= idx) {
                retIdx = i.getGlyphStartIndex()
                    + idx
                    - i.getUnicodeStart();
                if (idx < NUM_MOST_LIKELY_GLYPHS) {
                    mostLikelyGlyphs[idx] = retIdx;
                }
            }
        }
        return retIdx;
    }

Updated:

public int findGlyphIndex(int c) {
int idx = c;
int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;

// for most users the most likely glyphs are in the first cmap segments
(meaning the one with
// the lowest unicode start values)
if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
return mostLikelyGlyphs[idx];
}

for (int i = 0; (i < cmap.size()) && retIdx == 0; i++) {
if (cmap.get(i).getUnicodeStart() <= idx
&& cmap.get(i).getUnicodeEnd() >= idx) {

retIdx = cmap.get(i).getGlyphStartIndex()
+ idx
- cmap.get(i).getUnicodeStart();
if (idx < NUM_MOST_LIKELY_GLYPHS) {
mostLikelyGlyphs[idx] = retIdx;

}
}
}
return retIdx;
}

Regards,
Vinesh Kumar. D 




--
View this message in context: http://apache-fop.1065347.n5.nabble.com/FOP2-0-taking-more-time-to-format-complex-script-documents-tp42461p42749.html
Sent from the FOP - Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: FOP2.0 taking more time format complex script documents

Posted by Matthias Reischenbacher <ma...@gmx.at>.

Hi,

thanks for your analysis. I've committed a fix as part of
https://issues.apache.org/jira/browse/FOP-2530

Best regards,
Matthias

On 11.08.2015 11:36, dvineshkumar@gmail.com wrote:
> Hi,
> 
> After analysis, found a bug in MultiByteFont::findGlyphIndex() method. 
> In FOP2.0, MultiByteFont::findGlyphIndex() method, for loop is continous
> even after a glyph character is found. Updated the findGlyphIndex() method
> to terminate the loop once the glyph character is found and performance got
> improved much. Refer below existing and updated method.
> 
> Existing:
> 
>  public int findGlyphIndex(int c) {
>         int idx = c;
>         int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
> 
>         // for most users the most likely glyphs are in the first cmap
> segments (meaning the one with
>         // the lowest unicode start values)
>         if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
>             return mostLikelyGlyphs[idx];
>         }
>         for (CMapSegment i : cmap) {
>             if (retIdx == 0
>                     && i.getUnicodeStart() <= idx
>                     && i.getUnicodeEnd() >= idx) {
>                 retIdx = i.getGlyphStartIndex()
>                     + idx
>                     - i.getUnicodeStart();
>                 if (idx < NUM_MOST_LIKELY_GLYPHS) {
>                     mostLikelyGlyphs[idx] = retIdx;
>                 }
>             }
>         }
>         return retIdx;
>     }
> 
> Updated:
> 
> public int findGlyphIndex(int c) {
> int idx = c;
> int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
> 
> // for most users the most likely glyphs are in the first cmap segments
> (meaning the one with
> // the lowest unicode start values)
> if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
> return mostLikelyGlyphs[idx];
> }
> 
> for (int i = 0; (i < cmap.size()) && retIdx == 0; i++) {
> if (cmap.get(i).getUnicodeStart() <= idx
> && cmap.get(i).getUnicodeEnd() >= idx) {
> 
> retIdx = cmap.get(i).getGlyphStartIndex()
> + idx
> - cmap.get(i).getUnicodeStart();
> if (idx < NUM_MOST_LIKELY_GLYPHS) {
> mostLikelyGlyphs[idx] = retIdx;
> 
> }
> }
> }
> return retIdx;
> }
> 
> Regards,
> Vinesh Kumar. D 
> 
> 
> 
> 
> --
> View this message in context: http://apache-fop.1065347.n5.nabble.com/FOP2-0-taking-more-time-to-format-complex-script-documents-tp42461p42749.html
> Sent from the FOP - Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: FOP2.0 taking more time format complex script documents

Posted by Pascal Sancho <ps...@gmail.com>.

Hi,

AFAIK, there is no rules that prevent such usage.
as a starting point, you can follow this:
http://xmlgraphics.apache.org/fop/dev/conventions.html

2015-08-13 10:15 GMT+02:00 Klaus Malorny <Kl...@knipp.de>:
> On 12.08.2015 08:38, Pascal Sancho wrote:
>>
>> Hi,
>>
>> please, can you file in a Jira entry, attaching all materials (test
>> case, patch, etc.)
>>
>>
>> 2015-08-11 16:36 GMT+02:00 dvineshkumar@gmail.com
>> <dv...@gmail.com>:
>>>
>>> Hi,
>>>
>>> After analysis, found a bug in MultiByteFont::findGlyphIndex() method.
>>> In FOP2.0, MultiByteFont::findGlyphIndex() method, for loop is continous
>>> even after a glyph character is found. Updated the findGlyphIndex()
>>> method
>>> to terminate the loop once the glyph character is found and performance
>>> got
>>> improved much. Refer below existing and updated method.
>>>
>>> Existing:
>>>
>>>   public int findGlyphIndex(int c) {
>>>          int idx = c;
>>>          int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>>>
>>>          // for most users the most likely glyphs are in the first cmap
>>> segments (meaning the one with
>>>          // the lowest unicode start values)
>>>          if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0)
>>> {
>>>              return mostLikelyGlyphs[idx];
>>>          }
>>>          for (CMapSegment i : cmap) {
>>>              if (retIdx == 0
>>>                      && i.getUnicodeStart() <= idx
>>>                      && i.getUnicodeEnd() >= idx) {
>>>                  retIdx = i.getGlyphStartIndex()
>>>                      + idx
>>>                      - i.getUnicodeStart();
>>>                  if (idx < NUM_MOST_LIKELY_GLYPHS) {
>>>                      mostLikelyGlyphs[idx] = retIdx;
>>>                  }
>>>              }
>>>          }
>>>          return retIdx;
>>>      }
>>>
>>> Updated:
>>>
>>> public int findGlyphIndex(int c) {
>>> int idx = c;
>>> int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>>>
>>> // for most users the most likely glyphs are in the first cmap segments
>>> (meaning the one with
>>> // the lowest unicode start values)
>>> if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
>>> return mostLikelyGlyphs[idx];
>>> }
>>>
>>> for (int i = 0; (i < cmap.size()) && retIdx == 0; i++) {
>>> if (cmap.get(i).getUnicodeStart() <= idx
>>> && cmap.get(i).getUnicodeEnd() >= idx) {
>>>
>>> retIdx = cmap.get(i).getGlyphStartIndex()
>>> + idx
>>> - cmap.get(i).getUnicodeStart();
>>> if (idx < NUM_MOST_LIKELY_GLYPHS) {
>>> mostLikelyGlyphs[idx] = retIdx;
>>>
>>> }
>>> }
>>> }
>>> return retIdx;
>>> }
>>>
>>> Regards,
>>> Vinesh Kumar. D
>>>
>
> Just for curiosity: Are breaks and returns within loops forbidden in your
> coding conventions? ;-)
>
> By the way, if this is really a performance bottleneck and the number of
> segments are typically larger (say e.g. >= 10), I would sort the segments by
> their starts and convert the three values into arrays (during object
> construction) and would perform a binary search on the starts, then test for
> the end and finally calculate the index.
>
> Regards,
> Klaus
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>



-- 
pascal

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: FOP2.0 taking more time format complex script documents

Posted by Klaus Malorny <Kl...@knipp.de>.

On 12.08.2015 08:38, Pascal Sancho wrote:
> Hi,
>
> please, can you file in a Jira entry, attaching all materials (test
> case, patch, etc.)
>
>
> 2015-08-11 16:36 GMT+02:00 dvineshkumar@gmail.com <dv...@gmail.com>:
>> Hi,
>>
>> After analysis, found a bug in MultiByteFont::findGlyphIndex() method.
>> In FOP2.0, MultiByteFont::findGlyphIndex() method, for loop is continous
>> even after a glyph character is found. Updated the findGlyphIndex() method
>> to terminate the loop once the glyph character is found and performance got
>> improved much. Refer below existing and updated method.
>>
>> Existing:
>>
>>   public int findGlyphIndex(int c) {
>>          int idx = c;
>>          int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>>
>>          // for most users the most likely glyphs are in the first cmap
>> segments (meaning the one with
>>          // the lowest unicode start values)
>>          if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
>>              return mostLikelyGlyphs[idx];
>>          }
>>          for (CMapSegment i : cmap) {
>>              if (retIdx == 0
>>                      && i.getUnicodeStart() <= idx
>>                      && i.getUnicodeEnd() >= idx) {
>>                  retIdx = i.getGlyphStartIndex()
>>                      + idx
>>                      - i.getUnicodeStart();
>>                  if (idx < NUM_MOST_LIKELY_GLYPHS) {
>>                      mostLikelyGlyphs[idx] = retIdx;
>>                  }
>>              }
>>          }
>>          return retIdx;
>>      }
>>
>> Updated:
>>
>> public int findGlyphIndex(int c) {
>> int idx = c;
>> int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>>
>> // for most users the most likely glyphs are in the first cmap segments
>> (meaning the one with
>> // the lowest unicode start values)
>> if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
>> return mostLikelyGlyphs[idx];
>> }
>>
>> for (int i = 0; (i < cmap.size()) && retIdx == 0; i++) {
>> if (cmap.get(i).getUnicodeStart() <= idx
>> && cmap.get(i).getUnicodeEnd() >= idx) {
>>
>> retIdx = cmap.get(i).getGlyphStartIndex()
>> + idx
>> - cmap.get(i).getUnicodeStart();
>> if (idx < NUM_MOST_LIKELY_GLYPHS) {
>> mostLikelyGlyphs[idx] = retIdx;
>>
>> }
>> }
>> }
>> return retIdx;
>> }
>>
>> Regards,
>> Vinesh Kumar. D
>>

Just for curiosity: Are breaks and returns within loops forbidden in your coding 
conventions? ;-)

By the way, if this is really a performance bottleneck and the number of 
segments are typically larger (say e.g. >= 10), I would sort the segments by 
their starts and convert the three values into arrays (during object 
construction) and would perform a binary search on the starts, then test for the 
end and finally calculate the index.

Regards,
Klaus



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: FOP2.0 taking more time format complex script documents

Posted by Pascal Sancho <ps...@gmail.com>.

Hi,

please, can you file in a Jira entry, attaching all materials (test
case, patch, etc.)


2015-08-11 16:36 GMT+02:00 dvineshkumar@gmail.com <dv...@gmail.com>:
> Hi,
>
> After analysis, found a bug in MultiByteFont::findGlyphIndex() method.
> In FOP2.0, MultiByteFont::findGlyphIndex() method, for loop is continous
> even after a glyph character is found. Updated the findGlyphIndex() method
> to terminate the loop once the glyph character is found and performance got
> improved much. Refer below existing and updated method.
>
> Existing:
>
>  public int findGlyphIndex(int c) {
>         int idx = c;
>         int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>
>         // for most users the most likely glyphs are in the first cmap
> segments (meaning the one with
>         // the lowest unicode start values)
>         if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
>             return mostLikelyGlyphs[idx];
>         }
>         for (CMapSegment i : cmap) {
>             if (retIdx == 0
>                     && i.getUnicodeStart() <= idx
>                     && i.getUnicodeEnd() >= idx) {
>                 retIdx = i.getGlyphStartIndex()
>                     + idx
>                     - i.getUnicodeStart();
>                 if (idx < NUM_MOST_LIKELY_GLYPHS) {
>                     mostLikelyGlyphs[idx] = retIdx;
>                 }
>             }
>         }
>         return retIdx;
>     }
>
> Updated:
>
> public int findGlyphIndex(int c) {
> int idx = c;
> int retIdx = SingleByteEncoding.NOT_FOUND_CODE_POINT;
>
> // for most users the most likely glyphs are in the first cmap segments
> (meaning the one with
> // the lowest unicode start values)
> if (idx < NUM_MOST_LIKELY_GLYPHS && mostLikelyGlyphs[idx] != 0) {
> return mostLikelyGlyphs[idx];
> }
>
> for (int i = 0; (i < cmap.size()) && retIdx == 0; i++) {
> if (cmap.get(i).getUnicodeStart() <= idx
> && cmap.get(i).getUnicodeEnd() >= idx) {
>
> retIdx = cmap.get(i).getGlyphStartIndex()
> + idx
> - cmap.get(i).getUnicodeStart();
> if (idx < NUM_MOST_LIKELY_GLYPHS) {
> mostLikelyGlyphs[idx] = retIdx;
>
> }
> }
> }
> return retIdx;
> }
>
> Regards,
> Vinesh Kumar. D
>
>
>
>
> --
> View this message in context: http://apache-fop.1065347.n5.nabble.com/FOP2-0-taking-more-time-to-format-complex-script-documents-tp42461p42749.html
> Sent from the FOP - Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>



-- 
pascal

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org