You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Laird Nelson <lj...@yahoo.com> on 2001/12/11 20:07:04 UTC

Thoughts on StringUtils architecture

Here's an architectural thought that occurred to me.  It's related to
the overall architecture of StringUtils.

When munging text, frequently you want to work on isolated Strings. But
just as frequently you want to work on character streams, which read
their stuff in chunks.  What happens if you read a chunk, and the last
two characters of that chunk are the *first* two characters of your
three-character-long String-to-be-escaped?  Passing it to a stateless
escape() method, for example, won't escape the last two characters,
because of course it doesn't know that the third one is on the way.

The architecture I've found that works pretty well--although it seems
like overkill when presented this way, so bear with me--is something
like this (apologies if anyone finds this boringly obvious; it was a
Moment for me and my slow brain :-)):

Suppose you want to interpolate variable references ${like} ${this} and
replace them with, say, System.getProperty("like") and
System.getProperty("this").  And suppose you want to do that work so
that you can invoke it from a standalone class like StringUtils or a
Reader class.

The best bet is to implement a parser that takes in a StringBuffer (the
raw text), some kind of value object that holds the parser's state, and
that returns something convenient, like the StringBuffer interpolated
so far, or the new state.  java.text.ParsePosition is a bare bones
example of this sort of thing, used for java.text.MessageFormat etc.

That way if you call the parser several times on chunks of text that
look, for example, like this:

  Chunk 1: Hi, there, ${us
  Chunk 2: er.name}!  Earn free $
  Chunk 3: $$!

...the parser will report, via the state object, whether it's done with
a piece yet, and if you're in a Reader you can pay attention to this
and if you're in, say, StringUtils, you can ignore it.

Now if you invoke the parser from a standalone class like StringUtils,
you just ignore the fact that it's not done yet, and you get, as
results:

  Result of munging chunk 1: Hi, there, ${us
  Result of munging chunk 2: er.name}!  Earn free $
  Result of munging chunk 3: $$!

...i.e. in this stupid case the same as what you put in.  But if you
invoke the parser via a Reader using the same chunks, you can see that
you could build the Reader in such a way to have a cache that would let
you return this:

  Result of three read(char[], int, int) calls: Hi, there, lnelson! 
Earn free $$$!

So in general for greater-than-single-character munging, it pays to
create:

  1. A parser where you pass in its initial state each time you
     parse/munge the raw text
  2. A standalone method/class that simply invokes the parser once on
     the supplied String and ignores whether it's done or not
  3. A reader (or writer, or both) that feeds the parser as little as
     the parser needs to complete his work

I bring this up simply to call attention to it--basically to point out
that one should remember character streams when one is building
String-whacking routines such as escape().

Cheers,
Laird

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Thoughts on StringUtils architecture

Posted by Scott Sanders <ss...@nextance.com>.
But then you would just use Velocity :)

Scott

> -----Original Message-----
> From: Laird Nelson [mailto:ljnelson@yahoo.com] 
> Sent: Tuesday, December 11, 2001 11:07 AM
> To: commons-dev@jakarta.apache.org
> Subject: Thoughts on StringUtils architecture
> 
> 
> Here's an architectural thought that occurred to me.  It's 
> related to the overall architecture of StringUtils.
> 
> When munging text, frequently you want to work on isolated 
> Strings. But just as frequently you want to work on character 
> streams, which read their stuff in chunks.  What happens if 
> you read a chunk, and the last two characters of that chunk 
> are the *first* two characters of your three-character-long 
> String-to-be-escaped?  Passing it to a stateless
> escape() method, for example, won't escape the last two 
> characters, because of course it doesn't know that the third 
> one is on the way.
> 
> The architecture I've found that works pretty well--although 
> it seems like overkill when presented this way, so bear with 
> me--is something like this (apologies if anyone finds this 
> boringly obvious; it was a Moment for me and my slow brain :-)):
> 
> Suppose you want to interpolate variable references ${like} 
> ${this} and replace them with, say, 
> System.getProperty("like") and System.getProperty("this").  
> And suppose you want to do that work so that you can invoke 
> it from a standalone class like StringUtils or a Reader class.
> 
> The best bet is to implement a parser that takes in a 
> StringBuffer (the raw text), some kind of value object that 
> holds the parser's state, and that returns something 
> convenient, like the StringBuffer interpolated so far, or the 
> new state.  java.text.ParsePosition is a bare bones example 
> of this sort of thing, used for java.text.MessageFormat etc.
> 
> That way if you call the parser several times on chunks of 
> text that look, for example, like this:
> 
>   Chunk 1: Hi, there, ${us
>   Chunk 2: er.name}!  Earn free $
>   Chunk 3: $$!
> 
> ...the parser will report, via the state object, whether it's 
> done with a piece yet, and if you're in a Reader you can pay 
> attention to this and if you're in, say, StringUtils, you can 
> ignore it.
> 
> Now if you invoke the parser from a standalone class like 
> StringUtils, you just ignore the fact that it's not done yet, 
> and you get, as
> results:
> 
>   Result of munging chunk 1: Hi, there, ${us
>   Result of munging chunk 2: er.name}!  Earn free $
>   Result of munging chunk 3: $$!
> 
> ...i.e. in this stupid case the same as what you put in.  But 
> if you invoke the parser via a Reader using the same chunks, 
> you can see that you could build the Reader in such a way to 
> have a cache that would let you return this:
> 
>   Result of three read(char[], int, int) calls: Hi, there, lnelson! 
> Earn free $$$!
> 
> So in general for greater-than-single-character munging, it pays to
> create:
> 
>   1. A parser where you pass in its initial state each time you
>      parse/munge the raw text
>   2. A standalone method/class that simply invokes the parser once on
>      the supplied String and ignores whether it's done or not
>   3. A reader (or writer, or both) that feeds the parser as little as
>      the parser needs to complete his work
> 
> I bring this up simply to call attention to it--basically to 
> point out that one should remember character streams when one 
> is building String-whacking routines such as escape().
> 
> Cheers,
> Laird
> 
> __________________________________________________
> Do You Yahoo!?
> Check out Yahoo! Shopping and Yahoo! Auctions for all of
> your unique holiday gifts! Buy at http://shopping.yahoo.com
> or bid at http://auctions.yahoo.com
> 
> --
> To unsubscribe, e-mail:   
> <mailto:commons-dev-> unsubscribe@jakarta.apache.org>
> For 
> additional commands, 
> e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>