You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@poi.apache.org by se...@apache.org on 2011/08/09 08:47:01 UTC

svn commit: r1155227 - /poi/trunk/src/documentation/content/xdocs/hwpf/index.xml

Author: sergey
Date: Tue Aug  9 06:47:01 2011
New Revision: 1155227

URL: http://svn.apache.org/viewvc?rev=1155227&view=rev
Log:
more HWPF documentation

Modified:
    poi/trunk/src/documentation/content/xdocs/hwpf/index.xml

Modified: poi/trunk/src/documentation/content/xdocs/hwpf/index.xml
URL: http://svn.apache.org/viewvc/poi/trunk/src/documentation/content/xdocs/hwpf/index.xml?rev=1155227&r1=1155226&r2=1155227&view=diff
==============================================================================
--- poi/trunk/src/documentation/content/xdocs/hwpf/index.xml (original)
+++ poi/trunk/src/documentation/content/xdocs/hwpf/index.xml Tue Aug  9 06:47:01 2011
@@ -48,15 +48,63 @@
      either have a recent SVN checkout, or a recent SVN nightly build
      (including the scratchpad jar!)</p>
 
-  <p>Source in the
-     <em>org.apache.poi.hwpf.model</em> tree is the old legacy code refactored
-     into an object model. Source code in the
-     <em>org.apache.poi.hwpf.extractor</em> tree is a wrapper of this to
-     facilitate easy extraction of interesting things (eg the Text). 
-     Source code in the <em>org.apache.poi.hdf</em> tree is the old legacy
-     code.
-   </p>
+    <p>
+        Source code in the
+        <em>org.apache.poi.hdf</em>
+        tree is the old legacy code. Source in the
+        <em>org.apache.poi.hwpf.model</em>
+        tree is the old legacy code refactored into an new object model. Those packages contains
+        Java representation of internal Word format structure. This code is "internal", it shall not
+        be used by your code. Because of backward-compatibility some API still has references to
+        those packages. They are subject to be deprecated and removed. Code from
+        <em>org.apache.poi.hwpf.usermodel</em>
+        package is actual public and user-friendly (as much as possible) API to access document
+        parts. Source code in the
+        <em>org.apache.poi.hwpf.extractor</em>
+        tree is a wrapper of this to facilitate easy extraction of interesting things (eg the Text),
+        and
+        <em>org.apache.poi.hwpf.converter</em>
+        package contains Word-to-HTML and Word-to-FO converters (latest can be used to generate PDF
+        from Word files when using with
+        <a href="http://xmlgraphics.apache.org/fop/">Apache FOP</a>
+        ). Also there is a small file-structure-dumping utility in
+        <em>org.apache.poi.hwpf.dev</em>
+        package, primally for developing purposes.
+    </p>
+
+    <p>
+        The main entry point to HWPF is HWPFDocument. Currently it has a lot of references both to
+        internal interfaces (
+        <em>org.apache.poi.hwpf.model</em>
+        package) and public API (
+        <em>org.apache.poi.hwpf.usermodel</em>
+        ) package. It is possible that it will be split into two different interfaces (like WordFile
+        and WordDocument) in later versions.
+    </p>
+
+    <p>Word document can be considered as very long single text buffer. HWPF API provides "pointers"
+        to document parts, like sections, paragraphs and character runs. Usually user will iterates
+        over main document part sections, paragraphs from sections and character runs from
+        paragraph. Each such interface is a pointer to document text subrange along with additional
+        properties (and they all extends same Range parent class). There is additional Range
+        implementations like Table, TableRow, TableCell, etc. Some structures like Bookmark or Field
+        can also provide subranges pointers.
+    </p>
+
+    <p>Changing file content usually requires a lot of synchronized changes in those structures like
+        updating property boundaries, position handlers, etc. Because of that HWPF API shall be
+        considered as not thread safe. In addition, there is a "one pointer" rule for changing
+        content. It means you should not use two different Range instances at one time. More
+        precisely, if you are changing file content using some range pointer, all other range
+        pointers except parents' ones become invalid. For example if you obtain overall range (1),
+        paragraph range (2) from overall range and character run range (3) from paragraph range and
+        change text of paragraph, character run range is now invalid and should not be used, but
+        overall range pointer still valid. Each time you obtaining range (pointer) new instance is
+        created. It means if you obtained two range pointers and changed document text using first
+        range pointer, second one became invalid.
+    </p>
 
+   </section>
    <section>
     <title>XWPF Patches Required!</title>
 



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@poi.apache.org
For additional commands, e-mail: commits-help@poi.apache.org