You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Fabian Lange <fa...@codecentric.de> on 2011/12/05 12:00:10 UTC

Patch for POI parsing Smart Tags from DOCX

Hi,

While browsing the Tika Jira I found that they have issues with
SmartTags in DOCX documents.
https://issues.apache.org/jira/browse/TIKA-526

I tracked it down to the org.apache.poi.xwpf.usermodel.XWPFParagraph,
which does ignore Child elements of Type CTSmartTagRun.

I have a patch, which recursively collects the text from smart tags,
and this fixes the Tika issues.

<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
w:element="PlaceName">
	<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
w:element="place">

However, I noticed that there are no tests at all for hwpf, which
makes me not being very confident in my change :-)

a) Is it correct that there are no testcases here:
C:\repo\poi\src\testcases\org\apache\poi\hwpf
b) Whom can I contact to get feedback on my patch?
c) If I get confirmation on nonexisting tests, can we still get my
patch into beta5?

Regards,
Fabian

--
Fabian Lange | Leiter Competence Center Performance

codecentric AG | Merscheider Straße 1 | 42699 Solingen | Deutschland
tel: +49 (0) 212.23362821 | fax: +49 (0) 212.23362879 | mobil: +49 (0)
160.3673393
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Düsseldorf | HRB 63043 | Amtsgericht Düsseldorf
Vorstand: Klaus Jäger (Vorsitzender) | Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Bernd Klinkmann . Jürgen Schütz

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Patch for POI parsing Smart Tags from DOCX

Posted by Fabian Lange <fa...@codecentric.de>.
Hi all,
thanks for the clarification, Nick.
I filed my patch here:
https://issues.apache.org/bugzilla/show_bug.cgi?id=52285

Let me know when I need to do refinement, I am new :-)

Fabian


On Mon, Dec 5, 2011 at 12:50 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Mon, 5 Dec 2011, Fabian Lange wrote:
>>
>> However, I noticed that there are no tests at all for hwpf, which
>> makes me not being very confident in my change :-)
>
>
> First up, I think your change is in XWPF not HWPF - XPWF does .docx files,
> while HWPF is .doc ones
>
>
>> a) Is it correct that there are no testcases here:
>> C:\repo\poi\src\testcases\org\apache\poi\hwpf
>
>
> That's because the HWPF tests are in src/scratchpad/testcases and the XPWF
> ones are in src/ooxml/testcases (the tests live in the same bit of the
> codebase as the main classes for that area)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Re: Patch for POI parsing Smart Tags from DOCX

Posted by Nick Burch <ni...@alfresco.com>.
On Mon, 5 Dec 2011, Fabian Lange wrote:
> However, I noticed that there are no tests at all for hwpf, which
> makes me not being very confident in my change :-)

First up, I think your change is in XWPF not HWPF - XPWF does .docx files, 
while HWPF is .doc ones

> a) Is it correct that there are no testcases here:
> C:\repo\poi\src\testcases\org\apache\poi\hwpf

That's because the HWPF tests are in src/scratchpad/testcases and the XPWF 
ones are in src/ooxml/testcases (the tests live in the same bit of the 
codebase as the main classes for that area)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org