You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Fernando Antonio Prado <ka...@hotmail.com> on 2009/07/20 19:16:23 UTC

Bookmarks on word documents

Hi there. I've just subscribed here so this is my first email. I searched the POI site but couldn't find the answer to may problem. I've just downloaded the most recent version of HWPF and I1d like to know if there's any way I can get the bookmarks of a word document. I'll be very pleased with your help.. Thx!!

_________________________________________________________________
Com o Windows Live, você pode organizar, editar e compartilhar suas fotos.
http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx

Re: Bookmarks on word documents

Posted by MSB <ma...@tiscali.co.uk>.
No, sorry no progress at all. It will require some work to accomplish this
because of the way Word stores bookmarks; crudely, it uses a data structure
where each bookmark is listed along with a number. This number indicates
where - in terms of character positions - the bookmark was actually inserted
and also where the substitution text should be placed. On one level, this is
quite simple if you ignore the complexities of locating that data structure
within the doc file, but the complications will multiply when we try to
substitute the text for the boomark at the location indicated, at least I
fear this will be the case based upon my experience with search and replace
operations. As yet, I have no idea how and what to modify when the structure
of the Word document changes and once you bear in mind that the file
'contains' up to four streams and each stream is a linked list - at least at
the most basic level - the potential problems that could be caused by
corrupting any of these links become apparent.

Yours

Mark B



Ajai wrote:
> 
> Hi Mark,
> 
> I am also looking at retrieving bookmarks from a word document.
> 
> Can you kindly let me know if you were able to achieve it.
> 
> Regards,
> Ajai G
> 
> 
> 
> MSB wrote:
>> 
>> Made a little more progress yesterday. I do not think it will be possible
>> to simply search for control characters in the table stream, it is going
>> to be necessary to interrogate the file information block for the offset
>> to the list of bookmarks. That is not as scary as it sounds because a lot
>> of the work has already been done in the FIBFieldHandler class. I think I
>> am going to hack this by adding an associative list that allows me to
>> link the name or ID number of the offset field to the number of bytes.
>> Next, I need to work out all of the entities that can appear in the table
>> stream before the bookmarks, get the offset for each along with the
>> length, add all of this together and hopefuly arrive at the offset for
>> the bookmarks. Last night, I had the chance to check that by simply
>> accessing the field that relates to the offset for the bookmarks was not
>> enough; it gave me an offset of something like 50 bytes whereas I know
>> from mapping the table stream it should be something like 2000.
>> 
>> Cannot promise to work on the problem today - I have commitments both
>> during the day and evening - but will post again if I make any progress.
>> 
>> Yours
>> 
>> Mark B.
>> 
>> 
>> Fernando Antonio Prado wrote:
>>> 
>>> 
>>> Hi there. I've just subscribed here so this is my first email. I
>>> searched the POI site but couldn't find the answer to may problem. I've
>>> just downloaded the most recent version of HWPF and I1d like to know if
>>> there's any way I can get the bookmarks of a word document. I'll be very
>>> pleased with your help.. Thx!!
>>> 
>>> _________________________________________________________________
>>> Com o Windows Live, você pode organizar, editar e compartilhar suas
>>> fotos.
>>> http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Bookmarks-on-word-documents-tp24573916p26009080.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Bookmarks on word documents

Posted by Ajai <aj...@gmail.com>.
Hi Mark,

I am also looking at retrieving bookmarks from a word document.

Can you kindly let me know if you were able to achieve it.

Regards,
Ajai G



MSB wrote:
> 
> Made a little more progress yesterday. I do not think it will be possible
> to simply search for control characters in the table stream, it is going
> to be necessary to interrogate the file information block for the offset
> to the list of bookmarks. That is not as scary as it sounds because a lot
> of the work has already been done in the FIBFieldHandler class. I think I
> am going to hack this by adding an associative list that allows me to link
> the name or ID number of the offset field to the number of bytes. Next, I
> need to work out all of the entities that can appear in the table stream
> before the bookmarks, get the offset for each along with the length, add
> all of this together and hopefuly arrive at the offset for the bookmarks.
> Last night, I had the chance to check that by simply accessing the field
> that relates to the offset for the bookmarks was not enough; it gave me an
> offset of something like 50 bytes whereas I know from mapping the table
> stream it should be something like 2000.
> 
> Cannot promise to work on the problem today - I have commitments both
> during the day and evening - but will post again if I make any progress.
> 
> Yours
> 
> Mark B.
> 
> 
> Fernando Antonio Prado wrote:
>> 
>> 
>> Hi there. I've just subscribed here so this is my first email. I searched
>> the POI site but couldn't find the answer to may problem. I've just
>> downloaded the most recent version of HWPF and I1d like to know if
>> there's any way I can get the bookmarks of a word document. I'll be very
>> pleased with your help.. Thx!!
>> 
>> _________________________________________________________________
>> Com o Windows Live, você pode organizar, editar e compartilhar suas
>> fotos.
>> http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Bookmarks-on-word-documents-tp24573916p26006213.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Bookmarks on word documents

Posted by MSB <ma...@tiscali.co.uk>.
Made a little more progress yesterday. I do not think it will be possible to
simply search for control characters in the table stream, it is going to be
necessary to interrogate the file information block for the offset to the
list of bookmarks. That is not as scary as it sounds because a lot of the
work has already been done in the FIBFieldHandler class. I think I am going
to hack this by adding an associative list that allows me to link the name
or ID number of the offset field to the number of bytes. Next, I need to
work out all of the entities that can appear in the table stream before the
bookmarks, get the offset for each along with the length, add all of this
together and hopefuly arrive at the offset for the bookmarks. Last night, I
had the chance to check that by simply accessing the field that relates to
the offset for the bookmarks was not enough; it gave me an offset of
something like 50 bytes whereas I know from mapping the table stream it
should be something like 2000.

Cannot promise to work on the problem today - I have commitments both during
the day and evening - but will post again if I make any progress.

Yours

Mark B.


Fernando Antonio Prado wrote:
> 
> 
> Hi there. I've just subscribed here so this is my first email. I searched
> the POI site but couldn't find the answer to may problem. I've just
> downloaded the most recent version of HWPF and I1d like to know if there's
> any way I can get the bookmarks of a word document. I'll be very pleased
> with your help.. Thx!!
> 
> _________________________________________________________________
> Com o Windows Live, você pode organizar, editar e compartilhar suas fotos.
> http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx
> 

-- 
View this message in context: http://www.nabble.com/Bookmarks-on-word-documents-tp24573916p24600779.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Bookmarks on word documents

Posted by MSB <ma...@tiscali.co.uk>.
So far, I have found that the bookmarks are all contained within what is
called the table stream. If you want to have a look, try running this code;

        File file = null;
        FileInputStream fis = null;
        HWPFDocument document = null;
        try {
            file = new File("...your file name.......");
            fis = new FileInputStream(file);
            document = new HWPFDocument(fis);
            String tableStreamContents = new
String(document.getTableStream(), "UTF-16LE");
            for(int i = 0; i < tableStreamContents.length(); i++) {
                char aChar = tableStreamContents.charAt(i);
                System.out.println("Character: " + aChar + ", Hex: " +
Integer.toHexString(aChar));
            }
            System.setOut(sysOut);
        }
        catch(Exception ex) {
            ex.printStackTrace(System.out);
        }
        finally {
            if(fis != null) {
                try {
                    fis.close();
                    fis = null;
                }
                catch(Exception ex) {
                    // I G N O R E
                }
            }
        }

and you should see a very long output where each character is printed
accompanied by it's hex value. Scroll down far enough and you should find
the bookmarks.

The next step is to try and identify the character sequences that indicate
the beginning of the list of bookmarks, marks it's end and then, finally,
separates each bookmark from it's neighbour.

You may have to change the character encoding, which is currently hardcoded
to UTF-16LE, if the output you see looks to be just rubbish. Finally, I am
sorry if this all looks a little messy; HWPF is still very much in
development and it is quite often the case that work like this advances the
API one step further.

If I make any more progress, I will post again.

Yours

Mark B


Fernando Antonio Prado wrote:
> 
> 
> Hi there. I've just subscribed here so this is my first email. I searched
> the POI site but couldn't find the answer to may problem. I've just
> downloaded the most recent version of HWPF and I1d like to know if there's
> any way I can get the bookmarks of a word document. I'll be very pleased
> with your help.. Thx!!
> 
> _________________________________________________________________
> Com o Windows Live, você pode organizar, editar e compartilhar suas fotos.
> http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx
> 

-- 
View this message in context: http://www.nabble.com/Bookmarks-on-word-documents-tp24573916p24586415.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Bookmarks on word documents

Posted by MSB <ma...@tiscali.co.uk>.
Do you want just the bookmarks? Are they inserted into the document using the
Insert->Bookmark menu options? If so then I may have a solution for you but
need to test it before I post.

A while ago, another list user, Christian Gosch, and I worked on some code
to extract the fields from a Word document and found that they were included
in the text of the and enclosed between control characters. It may be the
case that bookmarks are treated in the same way and I need to dig around a
little to find out whether this is the case and, if so, which control
characters are used. If I manage to find anything out, I will post again as
soon as possible.

Yours

Mark B


Fernando Antonio Prado wrote:
> 
> 
> Hi there. I've just subscribed here so this is my first email. I searched
> the POI site but couldn't find the answer to may problem. I've just
> downloaded the most recent version of HWPF and I1d like to know if there's
> any way I can get the bookmarks of a word document. I'll be very pleased
> with your help.. Thx!!
> 
> _________________________________________________________________
> Com o Windows Live, você pode organizar, editar e compartilhar suas fotos.
> http://www.microsoft.com/brasil/windows/windowslive/products/photo-gallery-edit.aspx
> 

-- 
View this message in context: http://www.nabble.com/Bookmarks-on-word-documents-tp24573916p24582655.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org