You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2004/08/11 20:05:24 UTC

DO NOT REPLY [Bug 30603] New: - New code for header and footer in POI.HWPF

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30603>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=30603

New code for header and footer in POI.HWPF

           Summary: New code for header and footer in POI.HWPF
           Product: POI
           Version: 2.5
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: POI Overall
        AssignedTo: poi-dev@jakarta.apache.org
        ReportedBy: tforbis@dynalivery.com


Someone on the POI-user-list suggested this is the best way to submit code for 
approval.  I have included a description of the Ole format so others may modify 
my code if necesary.

Description:
As I was reading MSWord files, I discovered that there no current POI APIs for 
dealing with the headers and footers in my document.  So I resolved to find out 
a way to access them.  I looked that the files in a hex editor as well as read 
and outputted them to text through POI.  And to my surprise, the header and 
footer are dealt with the same as tables: just like ordinary text.  After the 
end of the entire document there are eight paragraph, if you have a header or a 
footer, they contain the text as follows ([ETX],,[EOT],,[ETX],,[EOT],).  For 
clarification, those are the unicode character 0003 (End of Text) and 0004 (End 
of Transmission) alternating with empty strings.  Then follows the text of the 
header and the footers separated by a blank paragraph.  My first question after 
I found that out was this:  what if the header and footer is just a blank 
paragraph, how can you tell the difference?  The separator paragraphs contain 
no tabs, while the text contains two tab stops (which is beyond me, I just use 
it to my advantage).  The layout of the header and footers is as follows: 
HeaderOdd, HeaderEven, FooterOdd, FooterEven, HeaderFirst, FooterFirst; for 
each section.  If the section either does not have diffrent even/odd pages or 
first page, then these are skipped.  For example, a document with three normal 
sections would have the same number of "feet" (as I called them because they 
came at the end of the document) as a document with one section with different 
first and odd/even pages.  By using the other data in the section properties, 
you can establish which "feet" go where in your document.
The class that i created just makes an array of feet based on the structure I 
explained above.  It doesn't know the section properties, so it treats them all 
the same.  It then also includes an iterator and functions for moving forward 
and backwards, as well as direct access if that is the preferred method.  I 
also modified the Range.java file to accept this type of "Range."  I based the 
function off the one for the table in which you pass it the first paragraph 
that is part of the Footer (which I classified as the one that contains [ETX]).
I will admit that this code is nowhere perfect, but I think that it is a good 
start towards making POI more Word-complete.  If you have any questions or 
comments, feel free to reply.

CODE:
package org.apache.poi.hwpf.usermodel;

public class Footer
  extends Range
{
  /**
   * is the footer of the word document
   * this contains all the header / footer information for a word document
   * it will go through and make an array of ranges that contains the text of 
these
   * usig the iterator, one can cycle through the sections of the document and
   * put i the appropiate headers where they belong.
   * The storing format is in order of sections and then by:
   *   Header for section
   *   Footer for section
   *   Header for First page
   *   Footer for First Page
   * This class currently does not have settings for odd and even page 
headers/footers
   *
   * @author Tim Forbis
   */
    
  /**
   *
   * number of "feet" in this footer.
   * it is the number total number of headers and footers in the document
   */
  private int numFeet;
  
  /**
   * used to cycle through theheaders and footers.
   * 0: represents the first one in list
   * numFeet-1: represents last one in array
   */
  private int iterator;
  
  /**
   * array of these feet.
   * each one is a full range that contains the paragraphs of the header/footer.
   * this "foot" can be treated the same as any pother range as far as 
formatting and
   * treatment in a client.
   */
  private Range[] Foot;
  
   
  Footer(int startIdx, int endIdx, Range parent, int levelNum)
  {
    super(startIdx, endIdx, Range.TYPE_PARAGRAPH, parent);
    int Start, End, i=0;
    int[] FootStart = new int[parent.numSections()*6];
    int[] FootEnd = new int[parent.numSections()*6];
    for (int j=0; j<parent.numSections()*6; j++)
    {
        FootStart[j]=0;
        FootEnd[j]=0;
    }
    int numParagraphs = numParagraphs();
    if (this.getParagraph(0).getCharacterRun(0).text().startsWith("\u0003"))
    {
      Start=8;
      this.initAll();
      End=Start+1;
      int limit = _paragraphs.size();
      if (this.getParagraph(Start).getTabList().length==0)
      {
        //there is a footer, but it has no data
        numFeet=-1;
      }
      else
      {
        do
        {
            for (; End < limit; End++)
            {
                // one paragraph without any tabs is a delimitor between the 
separate feet
                //the word document records two tabs in each header / footer; 
no purpose that i can see
            //if ((this.getParagraph(End+1).text()).length()<=1)
            if (this.getParagraph(End+1).getTabList().length==0)
            {
                break;
            }
            }
            FootStart[i]=Start;
            FootEnd[i]=End;
            Start=End;
            End=Start+1;
            i++;
            //three "empty" (without tab stops listed) is the delimintator for 
the end of the document
        //} while (!((this.getParagraph(End-2).getTabList().length==0)&&
(this.getParagraph(End).getTabList().length==0)&&(this.getParagraph(End-
1).getTabList().length==0)));
        } while (!(this.getParagraph(End-2).text().length()<2 && 
this.getParagraph(End-1).text().length()<2 && this.getParagraph(End).text
().length()<2));
        numFeet=i;
        Foot=new Range[numFeet];
        for (int k=0; k<numFeet; k++)
        {
            Foot[k]=new Range(FootStart[k], FootEnd[k], TYPE_PARAGRAPH, this);
        }
        iterator=0;
      }
    }   
  }
  
  /**
   * @return number of header/footers in the document foot
   * to be used in <code>for</code> loop
   */
  public int numFeet()
  {
    return numFeet;
  }
  
  public int curFoot()
  {
    return iterator+1;
  }
  
  /**
   * get a specific foot in the document given an index
   * used in <code>for</code> loop
   *
   * @param int index = location of foot
   * @return Range Foot
   */
  public Range getFoot(int index)
  {
    if ((index>=numFeet)||(index<=0))
    {
       throw new ArrayIndexOutOfBoundsException("Index outside of bounds of 
Footer");
    }
    else
    {
        return Foot[index];
    }
  }
  
  
  
  /**
   * get the foot that the iterator is curently pointed at
   * to be used in a <code>do-while</code> loop
   *
   * @return Range Foot
   */
  public Range getFoot()
  {
    if (iterator>=numFeet)
    {
       throw new ArrayIndexOutOfBoundsException("Index outside of bounds of 
Footer");
    }
    else
    {
        return Foot[iterator];
    }
  }
  
  /**
   * start the count over at zero, and guarantee that you are at the first 
header
   */
  public void MoveFirst()
  {
    iterator=0;
  }
  
  /**
   * increment the iterator, so that it is point at the next one in line
   * to be used in a <code>do-while</code> loop
   */
  public void MoveNext()
  {
    iterator++;
  }
  
  /**
   * decrement the iterator, so that it is point at the previous one in line
   * to be used in a <code>do-while</code> loop
   */
  public void MovePrevious()
  {
    iterator--;
  }
  
  /**
   * point at the last header in the list
   */
  public void MoveLast()
  {
    iterator=numFeet-1;
  }

  /**
   * @return is End of Foot
   */
  public boolean EOF()
  {
    return (iterator==numFeet);
  }
  
}

---------------------------------------------------------------------
To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org