You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2004/08/11 20:05:24 UTC
DO NOT REPLY [Bug 30603] New: -
New code for header and footer in POI.HWPF
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30603>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30603
New code for header and footer in POI.HWPF
Summary: New code for header and footer in POI.HWPF
Product: POI
Version: 2.5
Platform: PC
OS/Version: Windows NT/2K
Status: NEW
Severity: Enhancement
Priority: Other
Component: POI Overall
AssignedTo: poi-dev@jakarta.apache.org
ReportedBy: tforbis@dynalivery.com
Someone on the POI-user-list suggested this is the best way to submit code for
approval. I have included a description of the Ole format so others may modify
my code if necesary.
Description:
As I was reading MSWord files, I discovered that there no current POI APIs for
dealing with the headers and footers in my document. So I resolved to find out
a way to access them. I looked that the files in a hex editor as well as read
and outputted them to text through POI. And to my surprise, the header and
footer are dealt with the same as tables: just like ordinary text. After the
end of the entire document there are eight paragraph, if you have a header or a
footer, they contain the text as follows ([ETX],,[EOT],,[ETX],,[EOT],). For
clarification, those are the unicode character 0003 (End of Text) and 0004 (End
of Transmission) alternating with empty strings. Then follows the text of the
header and the footers separated by a blank paragraph. My first question after
I found that out was this: what if the header and footer is just a blank
paragraph, how can you tell the difference? The separator paragraphs contain
no tabs, while the text contains two tab stops (which is beyond me, I just use
it to my advantage). The layout of the header and footers is as follows:
HeaderOdd, HeaderEven, FooterOdd, FooterEven, HeaderFirst, FooterFirst; for
each section. If the section either does not have diffrent even/odd pages or
first page, then these are skipped. For example, a document with three normal
sections would have the same number of "feet" (as I called them because they
came at the end of the document) as a document with one section with different
first and odd/even pages. By using the other data in the section properties,
you can establish which "feet" go where in your document.
The class that i created just makes an array of feet based on the structure I
explained above. It doesn't know the section properties, so it treats them all
the same. It then also includes an iterator and functions for moving forward
and backwards, as well as direct access if that is the preferred method. I
also modified the Range.java file to accept this type of "Range." I based the
function off the one for the table in which you pass it the first paragraph
that is part of the Footer (which I classified as the one that contains [ETX]).
I will admit that this code is nowhere perfect, but I think that it is a good
start towards making POI more Word-complete. If you have any questions or
comments, feel free to reply.
CODE:
package org.apache.poi.hwpf.usermodel;
public class Footer
extends Range
{
/**
* is the footer of the word document
* this contains all the header / footer information for a word document
* it will go through and make an array of ranges that contains the text of
these
* usig the iterator, one can cycle through the sections of the document and
* put i the appropiate headers where they belong.
* The storing format is in order of sections and then by:
* Header for section
* Footer for section
* Header for First page
* Footer for First Page
* This class currently does not have settings for odd and even page
headers/footers
*
* @author Tim Forbis
*/
/**
*
* number of "feet" in this footer.
* it is the number total number of headers and footers in the document
*/
private int numFeet;
/**
* used to cycle through theheaders and footers.
* 0: represents the first one in list
* numFeet-1: represents last one in array
*/
private int iterator;
/**
* array of these feet.
* each one is a full range that contains the paragraphs of the header/footer.
* this "foot" can be treated the same as any pother range as far as
formatting and
* treatment in a client.
*/
private Range[] Foot;
Footer(int startIdx, int endIdx, Range parent, int levelNum)
{
super(startIdx, endIdx, Range.TYPE_PARAGRAPH, parent);
int Start, End, i=0;
int[] FootStart = new int[parent.numSections()*6];
int[] FootEnd = new int[parent.numSections()*6];
for (int j=0; j<parent.numSections()*6; j++)
{
FootStart[j]=0;
FootEnd[j]=0;
}
int numParagraphs = numParagraphs();
if (this.getParagraph(0).getCharacterRun(0).text().startsWith("\u0003"))
{
Start=8;
this.initAll();
End=Start+1;
int limit = _paragraphs.size();
if (this.getParagraph(Start).getTabList().length==0)
{
//there is a footer, but it has no data
numFeet=-1;
}
else
{
do
{
for (; End < limit; End++)
{
// one paragraph without any tabs is a delimitor between the
separate feet
//the word document records two tabs in each header / footer;
no purpose that i can see
//if ((this.getParagraph(End+1).text()).length()<=1)
if (this.getParagraph(End+1).getTabList().length==0)
{
break;
}
}
FootStart[i]=Start;
FootEnd[i]=End;
Start=End;
End=Start+1;
i++;
//three "empty" (without tab stops listed) is the delimintator for
the end of the document
//} while (!((this.getParagraph(End-2).getTabList().length==0)&&
(this.getParagraph(End).getTabList().length==0)&&(this.getParagraph(End-
1).getTabList().length==0)));
} while (!(this.getParagraph(End-2).text().length()<2 &&
this.getParagraph(End-1).text().length()<2 && this.getParagraph(End).text
().length()<2));
numFeet=i;
Foot=new Range[numFeet];
for (int k=0; k<numFeet; k++)
{
Foot[k]=new Range(FootStart[k], FootEnd[k], TYPE_PARAGRAPH, this);
}
iterator=0;
}
}
}
/**
* @return number of header/footers in the document foot
* to be used in <code>for</code> loop
*/
public int numFeet()
{
return numFeet;
}
public int curFoot()
{
return iterator+1;
}
/**
* get a specific foot in the document given an index
* used in <code>for</code> loop
*
* @param int index = location of foot
* @return Range Foot
*/
public Range getFoot(int index)
{
if ((index>=numFeet)||(index<=0))
{
throw new ArrayIndexOutOfBoundsException("Index outside of bounds of
Footer");
}
else
{
return Foot[index];
}
}
/**
* get the foot that the iterator is curently pointed at
* to be used in a <code>do-while</code> loop
*
* @return Range Foot
*/
public Range getFoot()
{
if (iterator>=numFeet)
{
throw new ArrayIndexOutOfBoundsException("Index outside of bounds of
Footer");
}
else
{
return Foot[iterator];
}
}
/**
* start the count over at zero, and guarantee that you are at the first
header
*/
public void MoveFirst()
{
iterator=0;
}
/**
* increment the iterator, so that it is point at the next one in line
* to be used in a <code>do-while</code> loop
*/
public void MoveNext()
{
iterator++;
}
/**
* decrement the iterator, so that it is point at the previous one in line
* to be used in a <code>do-while</code> loop
*/
public void MovePrevious()
{
iterator--;
}
/**
* point at the last header in the list
*/
public void MoveLast()
{
iterator=numFeet-1;
}
/**
* @return is End of Foot
*/
public boolean EOF()
{
return (iterator==numFeet);
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org