You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Joel Hirsh <jo...@gmail.com> on 2021/02/17 07:15:06 UTC

AllAndNone font

I am extracting text from a PDF that uses a Type0 font called AllAndNone.

It seems that this font uses its own character definitions, and has
characters for normal, bold, italic, etc all in the same font.

Everything reads just fine via PDF Box TextStripper, but I really need to
know if a character is bold or not.

If it was a Type1 font, I think I could extract the font and see which
characters were which and create a map.  Although I'm not even sure if
there is a standard mapping for this font since I cannot find anything
about how to create text with it.

However, I don't understand how such a font is constructed as a Type0
font.  There is nothing with that name anywhere in the PDF
Box source, which is what I would expect if it is a Type0.  Clearly I am
missing something and if anyone could explain, that would be great.

What I have in mind is to create a map so I can look up whether a character
is bold.

Re: AllAndNone font

Posted by Joel Hirsh <jo...@gmail.com>.
Here is a PDF with a snippet from such a file.

https://drive.google.com/file/d/1aW8b20LCgvWZ9Fn8JvUrkhAwXoQCibGH/view?usp=sharing

Thanks


On Tue, Feb 16, 2021 at 11:23 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Please upload the PDF to a sharehoster
>
> Tilman
>
> Am 17.02.2021 um 08:15 schrieb Joel Hirsh:
> > I am extracting text from a PDF that uses a Type0 font called AllAndNone.
> >
> > It seems that this font uses its own character definitions, and has
> > characters for normal, bold, italic, etc all in the same font.
> >
> > Everything reads just fine via PDF Box TextStripper, but I really need to
> > know if a character is bold or not.
> >
> > If it was a Type1 font, I think I could extract the font and see which
> > characters were which and create a map.  Although I'm not even sure if
> > there is a standard mapping for this font since I cannot find anything
> > about how to create text with it.
> >
> > However, I don't understand how such a font is constructed as a Type0
> > font.  There is nothing with that name anywhere in the PDF
> > Box source, which is what I would expect if it is a Type0.  Clearly I am
> > missing something and if anyone could explain, that would be great.
> >
> > What I have in mind is to create a map so I can look up whether a
> character
> > is bold.
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: AllAndNone font

Posted by Tilman Hausherr <TH...@t-online.de>.
Please upload the PDF to a sharehoster

Tilman

Am 17.02.2021 um 08:15 schrieb Joel Hirsh:
> I am extracting text from a PDF that uses a Type0 font called AllAndNone.
>
> It seems that this font uses its own character definitions, and has
> characters for normal, bold, italic, etc all in the same font.
>
> Everything reads just fine via PDF Box TextStripper, but I really need to
> know if a character is bold or not.
>
> If it was a Type1 font, I think I could extract the font and see which
> characters were which and create a map.  Although I'm not even sure if
> there is a standard mapping for this font since I cannot find anything
> about how to create text with it.
>
> However, I don't understand how such a font is constructed as a Type0
> font.  There is nothing with that name anywhere in the PDF
> Box source, which is what I would expect if it is a Type0.  Clearly I am
> missing something and if anyone could explain, that would be great.
>
> What I have in mind is to create a map so I can look up whether a character
> is bold.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: AllAndNone font

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 19.02.2021 um 18:57 schrieb Joel Hirsh:
> My original thought was to look at the specs for the full font.  But doing
> google searches, I cannot find any description of the font.  Since it was a
> Type0 font, I thought it would be defined somewhere in fontbox, but it is
> not one of the 14 Adobe defined fonts.  I think that is the part I just
> don't understand - how can it be a Type0 font?

Why would it be a standard 14 font? These are known (4 times, 4 
helvetica, 4 courier, symbol, zapf dingbats). Your font isn't one of 
them. There are several types of fonts, and yours is "type 0".


https://en.wikipedia.org/wiki/PostScript_fonts#Type_0

Tilman


>
> On Wed, Feb 17, 2021 at 10:54 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Hi,
>>
>> I had a look with PDFDebugger and I don't see a way to identify which
>> ones are bold etc, unless you start to analyse the shapes. (or better,
>> compare them with the full font)
>>
>> Tilman
>>
>> Am 17.02.2021 um 08:15 schrieb Joel Hirsh:
>>> I am extracting text from a PDF that uses a Type0 font called AllAndNone.
>>>
>>> It seems that this font uses its own character definitions, and has
>>> characters for normal, bold, italic, etc all in the same font.
>>>
>>> Everything reads just fine via PDF Box TextStripper, but I really need to
>>> know if a character is bold or not.
>>>
>>> If it was a Type1 font, I think I could extract the font and see which
>>> characters were which and create a map.  Although I'm not even sure if
>>> there is a standard mapping for this font since I cannot find anything
>>> about how to create text with it.
>>>
>>> However, I don't understand how such a font is constructed as a Type0
>>> font.  There is nothing with that name anywhere in the PDF
>>> Box source, which is what I would expect if it is a Type0.  Clearly I am
>>> missing something and if anyone could explain, that would be great.
>>>
>>> What I have in mind is to create a map so I can look up whether a
>> character
>>> is bold.
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: AllAndNone font

Posted by Joel Hirsh <jo...@gmail.com>.
My original thought was to look at the specs for the full font.  But doing
google searches, I cannot find any description of the font.  Since it was a
Type0 font, I thought it would be defined somewhere in fontbox, but it is
not one of the 14 Adobe defined fonts.  I think that is the part I just
don't understand - how can it be a Type0 font?

On Wed, Feb 17, 2021 at 10:54 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Hi,
>
> I had a look with PDFDebugger and I don't see a way to identify which
> ones are bold etc, unless you start to analyse the shapes. (or better,
> compare them with the full font)
>
> Tilman
>
> Am 17.02.2021 um 08:15 schrieb Joel Hirsh:
> > I am extracting text from a PDF that uses a Type0 font called AllAndNone.
> >
> > It seems that this font uses its own character definitions, and has
> > characters for normal, bold, italic, etc all in the same font.
> >
> > Everything reads just fine via PDF Box TextStripper, but I really need to
> > know if a character is bold or not.
> >
> > If it was a Type1 font, I think I could extract the font and see which
> > characters were which and create a map.  Although I'm not even sure if
> > there is a standard mapping for this font since I cannot find anything
> > about how to create text with it.
> >
> > However, I don't understand how such a font is constructed as a Type0
> > font.  There is nothing with that name anywhere in the PDF
> > Box source, which is what I would expect if it is a Type0.  Clearly I am
> > missing something and if anyone could explain, that would be great.
> >
> > What I have in mind is to create a map so I can look up whether a
> character
> > is bold.
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: AllAndNone font

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

I had a look with PDFDebugger and I don't see a way to identify which 
ones are bold etc, unless you start to analyse the shapes. (or better, 
compare them with the full font)

Tilman

Am 17.02.2021 um 08:15 schrieb Joel Hirsh:
> I am extracting text from a PDF that uses a Type0 font called AllAndNone.
>
> It seems that this font uses its own character definitions, and has
> characters for normal, bold, italic, etc all in the same font.
>
> Everything reads just fine via PDF Box TextStripper, but I really need to
> know if a character is bold or not.
>
> If it was a Type1 font, I think I could extract the font and see which
> characters were which and create a map.  Although I'm not even sure if
> there is a standard mapping for this font since I cannot find anything
> about how to create text with it.
>
> However, I don't understand how such a font is constructed as a Type0
> font.  There is nothing with that name anywhere in the PDF
> Box source, which is what I would expect if it is a Type0.  Clearly I am
> missing something and if anyone could explain, that would be great.
>
> What I have in mind is to create a map so I can look up whether a character
> is bold.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org