You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Christian Gosch <ch...@inovex.de> on 2009/02/26 18:15:16 UTC

RE: Q: How to check if a Word .doc file is a mail merge master file?

Hello, MSB [markbrdsly],

to answer the last one first: I do not know if there is any useful 
internal / technical difference, but in fact Word itself does recognize 
that: If you open a document prepared as mail merge master file, Word 
knows that it is one, and e. g. display the mail merge ribbon / toolbar.

The first one should not be possible without the second one (or returns 
the answer to the other question intrinsically): If there are mail merge 
fields inside a document, usually it is supposed to be a mail merge 
master document. (To be honest: I do not know how this kind of doc is 
officially called in English -- in German it as called 
"Seriendruck-Hauptdokument".)

By the way: When / with which version was the method in question 
introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find any 
hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due to 
IBM WebSphere 6.0 as runtime...

Thanks anyway,
Christian

> -----Original Message-----
> From: MSB [mailto:markbrdsly@tiscali.co.uk]
> Sent: Thursday, February 26, 2009 6:00 PM
> To: user@poi.apache.org
> Subject: Re: Q: How to check if a Word .doc file is a mail merge 
master
> file?
> 
> 
> Hello Christian,
> 
> I would guess that the answer to your second question is yes. It is
> possible
> to use HWPF to extract the data from a Word document - in fact Nick 
has
> built a class that does just this and it is called WordExtractor I 
think.
> It
> returns an array of Strings if I remember correctly and it would not 
be
> too
> difficult to imagine that you could check the complete set of values
> returned and if - only if - that complete set was limited to your 
'table
> structure' (if I understand that correctly) then the document would 
pass
> your validation test.
> 
> To answer your first question, I need to ask another one; what set or
> criteria distinguish a mail merge master file from any other document 
or
> document template that could be created using Word? If you are able to
> formulate such a list then it would be possible to determine if HWPF 
could
> be used to parse the Word file and determine it's status.
> 
> 
> Christian Gosch-2 wrote:
> >
> > Is it possible using POI to check if a given Word *.doc file
> > (Word2K/2003) is a Mail Merge master file?
> >
> > Is it then possible to retrieve or find by inspection the mail merge
> > data field references used in the mail merge master file?
> >
> > We do not need to change anything, we just want to check if a given 
file
> > is a valid mail merge master and matches a given and known "table
> > structure", i. e. uses only a given set of mail merge data field
> > references. (validation)
> >
> > Up to now, our validation just checks the file extension and does 
not
> > execute any introspection.
> >
> > Thanks for answers,
> > --
> > Dipl.-Inform. Christian Gosch, PMI PMP
> > Systems Architecture, Project Management
> >
> > inovex GmbH
> > Büro Pforzheim
> > Karlsruher Strasse 71
> > D-75179 Pforzheim
> > Tel: +49 (0)7231 3191-85
> > Fax: +49 (0)7231 3191-91
> > c.gosch@inovex.de
> > www.inovex.de
> >
> > Sitz der Gesellschaft: Pforzheim
> > AG Mannheim, HRB 502126
> > Geschäftsführer: Stephan Müller
> >
> >
> >
> > 
---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> > For additional commands, e-mail: user-help@poi.apache.org
> >
> >
> >
> 
> --
> View this message in context: 
http://www.nabble.com/Q%3A-How-to-check-if-
> a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22228552.html
> Sent from the POI - User mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> !DSPAM:49a6ca9e326666883415967!
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Q: How to check if a Word .doc file is a mail merge master file?

Posted by Christian Gosch <ch...@inovex.de>.
Thanks for helping me in my blindness :-( 

indeed it is part of the distribution...

Christian

> -----Original Message-----
> From: Nick Burch [mailto:nick@torchbox.com]
> Sent: Friday, February 27, 2009 12:03 PM
> To: POI Users List
> Subject: RE: Q: How to check if a Word .doc file is a mail merge 
master
> file?
> 
> On Fri, 27 Feb 2009, Christian Gosch wrote:
> > How do I get a ready-to-use version of this JAR the simplest way?
> 
> http://poi.apache.org/ then click Download from the left hand menu
> 
> > What about support for good old JDK 1.4.2 in such a JAR?
> 
> For jdk 1.4 support, you'll need POI 3.2. We moved to a minimum of JDK 
1.5
> with POI 3.5
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> !DSPAM:49a7c831326661759485400!
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Q: How to check if a Word .doc file is a mail merge master file?

Posted by Nick Burch <ni...@torchbox.com>.
On Fri, 27 Feb 2009, Christian Gosch wrote:
> How do I get a ready-to-use version of this JAR the simplest way?

http://poi.apache.org/ then click Download from the left hand menu

> What about support for good old JDK 1.4.2 in such a JAR?

For jdk 1.4 support, you'll need POI 3.2. We moved to a minimum of JDK 1.5 
with POI 3.5

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Q: How to check if a Word .doc file is a mail merge master file?

Posted by Christian Gosch <ch...@inovex.de>.
Taking the risk that this question is answered already on multiple 
places here and on the POI site: How do I get a ready-to-use version of 
this JAR the simplest way? Do I have to download the sources using svn, 
building it using maven in the right version etc. etc., or is there any 
possibility to get a bightly-built (t!) JAR file?

What about support for good old JDK 1.4.2 in such a JAR?

Thanks,
Christian

> -----Original Message-----
> From: David Fisher [mailto:dfisher@jmlafferty.com]
> Sent: Thursday, February 26, 2009 8:40 PM
> To: POI Users List
> Subject: Re: Q: How to check if a Word .doc file is a mail merge 
master
> file?
> 
> Hi -
> 
> HWPF is in the scratchpad jar.  poi-scratchpad-3.2-FINAL-20081019.jar
> 
> It was first loaded into the source three years ago.
> 
> 
http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/h
> wpf/extractor/WordExtractor.java?view=log
> 
> Regards,
> Dave
> 
> On Feb 26, 2009, at 9:15 AM, Christian Gosch wrote:
> 
> > By the way: When / with which version was the method in question
> > introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find 
any
> > hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due 
to
> > IBM WebSphere 6.0 as runtime...
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> !DSPAM:49a6f01a326661528628103!
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Q: How to check if a Word .doc file is a mail merge master file?

Posted by David Fisher <df...@jmlafferty.com>.
Hi -

HWPF is in the scratchpad jar.  poi-scratchpad-3.2-FINAL-20081019.jar

It was first loaded into the source three years ago.

http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/extractor/WordExtractor.java?view=log

Regards,
Dave

On Feb 26, 2009, at 9:15 AM, Christian Gosch wrote:

> By the way: When / with which version was the method in question
> introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find any
> hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due to
> IBM WebSphere 6.0 as runtime...


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


RE: Q: How to check if a Word .doc file is a mail merge master file?

Posted by MSB <ma...@tiscali.co.uk>.
It seems then correct to assume that Word inserts some information - metadata
most likely - into the .doc file that it uses to recognise that a file is a
mail merge master? Whilst I am in no way an expert, I would doubt very much
that HWPF will be able to read this and recognise it. What it can do however
is expose both the DocumentProperties and it's FileInformationBlock - I have
capitalised these names as there are classes of that name in the hwpf.nodel
package. As a first step, it might be worth creating two files - one a merge
master, the other an ordinary Word document - and seeing how the properties
anf ile information differs. Just as an aside, I have had a quick look at
the javadoc and there are two get and two set methods in the
DocumentProperties class that have the word 'merge' in their names. I do not
know what they are for but it could be a good starting point.

With regard the the WordExtractor class, you need to access the scratchpad
as David suggests to get your hands on that.

Finally, I have been thinking about my previous reply and I neglected to
mention that I was making a HUGE assumption. I assumed that the bookmarks
would be recognised as text also, not simply some special series of control
characters. That being the case, it should be possible to recover them from
the document and perform the sorts of comparisons you need to undertake.
That ssumption will be very easy for you to test once you get your hands on
HWPF - simply run the WordExtractor class against a mail merge document and
see what the class returns. Even if that class does not give you just what
you want, you can still inspect the document further as there are other
sorts of objects bound up within the document.


Christian Gosch-2 wrote:
> 
> Hello, MSB [markbrdsly],
> 
> to answer the last one first: I do not know if there is any useful 
> internal / technical difference, but in fact Word itself does recognize 
> that: If you open a document prepared as mail merge master file, Word 
> knows that it is one, and e. g. display the mail merge ribbon / toolbar.
> 
> The first one should not be possible without the second one (or returns 
> the answer to the other question intrinsically): If there are mail merge 
> fields inside a document, usually it is supposed to be a mail merge 
> master document. (To be honest: I do not know how this kind of doc is 
> officially called in English -- in German it as called 
> "Seriendruck-Hauptdokument".)
> 
> By the way: When / with which version was the method in question 
> introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find any 
> hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due to 
> IBM WebSphere 6.0 as runtime...
> 
> Thanks anyway,
> Christian
> 
>> -----Original Message-----
>> From: MSB [mailto:markbrdsly@tiscali.co.uk]
>> Sent: Thursday, February 26, 2009 6:00 PM
>> To: user@poi.apache.org
>> Subject: Re: Q: How to check if a Word .doc file is a mail merge 
> master
>> file?
>> 
>> 
>> Hello Christian,
>> 
>> I would guess that the answer to your second question is yes. It is
>> possible
>> to use HWPF to extract the data from a Word document - in fact Nick 
> has
>> built a class that does just this and it is called WordExtractor I 
> think.
>> It
>> returns an array of Strings if I remember correctly and it would not 
> be
>> too
>> difficult to imagine that you could check the complete set of values
>> returned and if - only if - that complete set was limited to your 
> 'table
>> structure' (if I understand that correctly) then the document would 
> pass
>> your validation test.
>> 
>> To answer your first question, I need to ask another one; what set or
>> criteria distinguish a mail merge master file from any other document 
> or
>> document template that could be created using Word? If you are able to
>> formulate such a list then it would be possible to determine if HWPF 
> could
>> be used to parse the Word file and determine it's status.
>> 
>> 
>> Christian Gosch-2 wrote:
>> >
>> > Is it possible using POI to check if a given Word *.doc file
>> > (Word2K/2003) is a Mail Merge master file?
>> >
>> > Is it then possible to retrieve or find by inspection the mail merge
>> > data field references used in the mail merge master file?
>> >
>> > We do not need to change anything, we just want to check if a given 
> file
>> > is a valid mail merge master and matches a given and known "table
>> > structure", i. e. uses only a given set of mail merge data field
>> > references. (validation)
>> >
>> > Up to now, our validation just checks the file extension and does 
> not
>> > execute any introspection.
>> >
>> > Thanks for answers,
>> > --
>> > Dipl.-Inform. Christian Gosch, PMI PMP
>> > Systems Architecture, Project Management
>> >
>> > inovex GmbH
>> > Büro Pforzheim
>> > Karlsruher Strasse 71
>> > D-75179 Pforzheim
>> > Tel: +49 (0)7231 3191-85
>> > Fax: +49 (0)7231 3191-91
>> > c.gosch@inovex.de
>> > www.inovex.de
>> >
>> > Sitz der Gesellschaft: Pforzheim
>> > AG Mannheim, HRB 502126
>> > Geschäftsführer: Stephan Müller
>> >
>> >
>> >
>> > 
> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> > For additional commands, e-mail: user-help@poi.apache.org
>> >
>> >
>> >
>> 
>> --
>> View this message in context: 
> http://www.nabble.com/Q%3A-How-to-check-if-
>> a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22228552.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 
>> !DSPAM:49a6ca9e326666883415967!
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Q%3A-How-to-check-if-a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22241109.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org