You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by shah manon <sy...@yahoo.com.INVALID> on 2021/12/06 12:52:17 UTC

How to get MuPDF TextPage functionality with PDFBox?

For organizing books and article I need a light weight PDF viewer with copy, highlight, image snap sort, sticky note, search option in JavaFX. By googling I come to PDFBox and MuPDF. MuPDF has a class TextPage which is amazing but MuPDF is written in C++ and its Java binding is a subset of its original API.
As I am very new to PDFBox Can anybody tell me how can I get the functionality of extractText(), extractTEXT(), extractBLOCKS(), extractWORDS(), extractHTML(), extractXHTML(), extractXML(), extractDICT(), extractJSON(), extractRAWDICT(), extractRAWJSON(), search()  using PDFBox please?
Nadvi.

Re: How to get MuPDF TextPage functionality with PDFBox?

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

Text extraction is available from the PDFTextStripper class. A subclass 
can create HTML. All the rest you'll have to write yourself.

Tilman

Am 06.12.2021 um 13:52 schrieb shah manon:
> For organizing books and article I need a light weight PDF viewer with copy, highlight, image snap sort, sticky note, search option in JavaFX. By googling I come to PDFBox and MuPDF. MuPDF has a class TextPage which is amazing but MuPDF is written in C++ and its Java binding is a subset of its original API.
> As I am very new to PDFBox Can anybody tell me how can I get the functionality of extractText(), extractTEXT(), extractBLOCKS(), extractWORDS(), extractHTML(), extractXHTML(), extractXML(), extractDICT(), extractJSON(), extractRAWDICT(), extractRAWJSON(), search()  using PDFBox please?
> Nadvi.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org