You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Nick Burch <ni...@torchbox.com> on 2008/08/12 22:14:52 UTC

Re: Rubbish in extracted text

On Fri, 16 May 2008, Rainer Schwarze wrote:
> these are fields. A quick solution is this: Pass the extracted text 
> string through a filter which removes the field codes. Fields are 
> delimited by 0x13 (start), 0x14 (separator) and 0x15 (end) bytes. With 
> fields which don't have a separator (0x14), remove all from 0x13 to 
> 0x15.

I've just added some code to svn to implement this algorithm. It's on 
Range, and is Range.stripFields(String)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org