You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@maven.apache.org by Thomas Marti <th...@schweiz.org> on 2009/04/23 13:43:22 UTC

Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Hi Maven cracks

At a customer site there is a custom, company-wide dictionary available for
spellchecking. This dictionary is managed in an proprietary application from
where you can export it. For the webapp we're building we need to transform this
dictionary into a very simple format: a single file with one dictionary entry 
per line. The export format is somewhat special as its spread over a bunch of 
files (one for each letter of the alphabet), contains additional syllabication 
info, which we don't need and also has some comments that have to be removed. 
The specifics of the format aren't really that important here though...

After some testing I came up with the following short bash-script that fullfills
all my needs:

8<-----------------------------------------------------------
tmp_folder=target/dict
cls_folder=target/classes
mkdir -p $tmp_folder
mkdir -p $cls_folder

cat src/main/dictionary/*.lst > $tmp_folder/tmp1.dict
sed "s/[~?]//g" $tmp_folder/tmp1.dict > $tmp_folder/tmp2.dict
sed "s/  .*$//g" $tmp_folder/tmp2.dict > $tmp_folder/tmp3.dict
sort -u -o $cls_folder/my.dict $tmp_folder/tmp3.dict
8<-----------------------------------------------------------

(In other words: Take all files src/main/dictionary/*.lst, concat them into one
single file, match some strings with simple regexes and remove those, and
finally sort the dictionary entries and remove all duplicates.)

This script is then called from within maven with exec-maven-plugin. Afterwards
maven-jar-plugin wraps the file in a simple jar, so the dictionary can be easily 
consumed in Java using getClassLoader().getResourceAsStream().

Now all is well & nice and this script even performs sufficently given about 1.6
million dictionary entries (~38MB). But of course it's not really the Maven way 
to do things, especially because it's not portable. You need to have some kind 
of Unix-like enviroment in place for this script to work.

I've given this some thought, but I can't seem to find a possible combination of 
maven plugins that's able to do what four lines of bash script achieve so elegantly.

I'd really like to hear your ideas on this matter...


Bye,
  Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


RE: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Stan Devitt <sd...@rim.com>.
Note that there is a  <<  in the code to append to a file.

-----Original Message-----
From: grettke@gmail.com [mailto:grettke@gmail.com] On Behalf Of Grant
Rettke
Sent: Thursday, April 23, 2009 5:24 PM
To: Maven Users List
Subject: Re: Challenge: Find a plugin combo that achieves what 4 lines
of bash script can do...

On Thu, Apr 23, 2009 at 4:13 PM, Martin Gainty <mg...@hotmail.com>
wrote:
> CDATA is character data
> instructs the parser to leave everything inside [] alone
> http://en.wikipedia.org/wiki/CDATA

The demo code doesn't use it. Have you found that the code gets mucked
with?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Grant Rettke <gr...@acm.org>.
On Thu, Apr 23, 2009 at 4:13 PM, Martin Gainty <mg...@hotmail.com> wrote:
> CDATA is character data
> instructs the parser to leave everything inside [] alone
> http://en.wikipedia.org/wiki/CDATA

The demo code doesn't use it. Have you found that the code gets mucked with?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


RE: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Martin Gainty <mg...@hotmail.com>.
CDATA is character data 
instructs the parser to leave everything inside [] alone
http://en.wikipedia.org/wiki/CDATA

Martin 
______________________________________________ 
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de déni et de confidentialité 
This message is confidential. If you should not be the intended receiver, then we ask politely to report. Each unauthorized forwarding or manufacturing of a copy is inadmissible. This message serves only for the exchange of information and has no legal binding effect. Due to the easy manipulation of emails we cannot take responsibility over the the contents.
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.






> Date: Thu, 23 Apr 2009 15:38:38 -0500
> Subject: Re: Challenge: Find a plugin combo that achieves what 4 lines of bash 	script can do...
> From: grettke@acm.org
> To: users@maven.apache.org
> 
> On Thu, Apr 23, 2009 at 11:03 AM, Stan Devitt <sd...@rim.com> wrote:
> >           <plugin>
> >               <groupId>org.codehaus.groovy.maven</groupId>
> >               <artifactId>gmaven-plugin</artifactId>
> >               <executions>
> >                <execution>
> >                 <phase>generate-resources</phase>
> >                 <goals>
> >                   <goal>execute</goal>
> >                 </goals>
> >                 <configuration>
> >                   <source>
> >                   <![CDATA[
> > new File( "target" ).mkdirs();
> > def resultfile = new File( "target/dictionary" );
> > new File("src/main/dictionary").eachFileMatch(~/.*\.lst/){ file ->
> >  file.readLines().sort().each(){resultfile << it.trim() + "\n";}
> > }
> >                   ]]>
> >                   </source>
> >                 </configuration>
> >                </execution>
> >               </executions>
> >           </plugin>
> 
> http://groovy.codehaus.org/GMaven+-+Executing+Groovy+Code#GMaven-ExecutingGroovyCode-ExecuteanInlineGroovyScript
> 
> Why is the CDATA in there?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> For additional commands, e-mail: users-help@maven.apache.org
> 

_________________________________________________________________
Windows Live™ SkyDrive™: Get 25 GB of free online storage.  
http://windowslive.com/online/skydrive?ocid=TXT_TAGLM_WL_skydrive_042009

Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Grant Rettke <gr...@acm.org>.
Understood. Thanks!

On Fri, Apr 24, 2009 at 9:20 AM, Stan Devitt <sd...@rim.com> wrote:
> The  <[CDATA[ .... ]]>  bracketing simply tells the xml parser to read the enclosed text as raw text.  This allows you to use embedded characters like < and >, and is quite handy when the text is code fragments which can contain a lot of these special characters.  (It allows you to cut and paste into XML.)
>
> The resulting parsed xml document is identical whether you use entities to escape individual characters or use CDATA.
>
> If your program writes the document out again, the CDATA will be gone and the corresponding text in the resulting document will escape individual characters with  entities like &gt; and &lt;.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


RE: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Stan Devitt <sd...@rim.com>.
The  <[CDATA[ .... ]]>  bracketing simply tells the xml parser to read the enclosed text as raw text.  This allows you to use embedded characters like < and >, and is quite handy when the text is code fragments which can contain a lot of these special characters.  (It allows you to cut and paste into XML.)

The resulting parsed xml document is identical whether you use entities to escape individual characters or use CDATA.

If your program writes the document out again, the CDATA will be gone and the corresponding text in the resulting document will escape individual characters with  entities like &gt; and &lt;.

Stan
-----Original Message-----
From: grettke@gmail.com [mailto:grettke@gmail.com] On Behalf Of Grant Rettke
Sent: Thursday, April 23, 2009 4:39 PM
To: Maven Users List
Subject: Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

On Thu, Apr 23, 2009 at 11:03 AM, Stan Devitt <sd...@rim.com> wrote:
>           <plugin>
>               <groupId>org.codehaus.groovy.maven</groupId>
>               <artifactId>gmaven-plugin</artifactId>
>               <executions>
>                <execution>
>                 <phase>generate-resources</phase>
>                 <goals>
>                   <goal>execute</goal>
>                 </goals>
>                 <configuration>
>                   <source>
>                   <![CDATA[
> new File( "target" ).mkdirs();
> def resultfile = new File( "target/dictionary" );
> new File("src/main/dictionary").eachFileMatch(~/.*\.lst/){ file ->
>  file.readLines().sort().each(){resultfile << it.trim() + "\n";}
> }
>                   ]]>
>                   </source>
>                 </configuration>
>                </execution>
>               </executions>
>           </plugin>

http://groovy.codehaus.org/GMaven+-+Executing+Groovy+Code#GMaven-ExecutingGroovyCode-ExecuteanInlineGroovyScript

Why is the CDATA in there?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Grant Rettke <gr...@acm.org>.
On Thu, Apr 23, 2009 at 11:03 AM, Stan Devitt <sd...@rim.com> wrote:
>           <plugin>
>               <groupId>org.codehaus.groovy.maven</groupId>
>               <artifactId>gmaven-plugin</artifactId>
>               <executions>
>                <execution>
>                 <phase>generate-resources</phase>
>                 <goals>
>                   <goal>execute</goal>
>                 </goals>
>                 <configuration>
>                   <source>
>                   <![CDATA[
> new File( "target" ).mkdirs();
> def resultfile = new File( "target/dictionary" );
> new File("src/main/dictionary").eachFileMatch(~/.*\.lst/){ file ->
>  file.readLines().sort().each(){resultfile << it.trim() + "\n";}
> }
>                   ]]>
>                   </source>
>                 </configuration>
>                </execution>
>               </executions>
>           </plugin>

http://groovy.codehaus.org/GMaven+-+Executing+Groovy+Code#GMaven-ExecutingGroovyCode-ExecuteanInlineGroovyScript

Why is the CDATA in there?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


RE: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Stan Devitt <sd...@rim.com>.
One of the nice things about maven projects is that you download them
and they just work.

Assuming the dictionary source files are already broken down by letters
of the alphabet, then the following 5 lines of code does most of it.
(Note that the sed scripts are pretty close to just a line by line
trim() and, of course, you may need to sort the file names.)



           <plugin>
               <groupId>org.codehaus.groovy.maven</groupId>
               <artifactId>gmaven-plugin</artifactId>
               <executions>
                <execution>
                 <phase>generate-resources</phase>
                 <goals>
                   <goal>execute</goal>
                 </goals>
                 <configuration>
                   <source>
                   <![CDATA[
new File( "target" ).mkdirs();
def resultfile = new File( "target/dictionary" );
new File("src/main/dictionary").eachFileMatch(~/.*\.lst/){ file ->
  file.readLines().sort().each(){resultfile << it.trim() + "\n";}
}
                   ]]>
                   </source>
                 </configuration>
                </execution>
               </executions>
           </plugin>


Stan

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Barrie Treloar <ba...@gmail.com>.
On Thu, Apr 23, 2009 at 10:13 PM, Thomas Marti <th...@schweiz.org> wrote:
> Hi Manos
>
> Manos Batsis wrote:
>>
>> Thomas Marti wrote:
>> That shouldn't be a problem, even windows can play along using cygwin. If
>> you really need out-of-the-box portability just patch up a custom plugin
>> that does the same through java code.
>
> That's all true. But that's not really the point here...
>
> I was very suprised that I haven't been able to find plugins to achieve a
> few simple tasks like merging/concating files together, replacing random
> strings (that aren't properties) in resources, and finally sorting files.
>
> Now, as for existing plugin for replacing I have found two candidates:
> http://www.stephenduncanjr.com/projects/xpathreplacement-maven-plugin/
> http://code.google.com/p/maven-replacer-plugin/
>
> But xpathreplacement-maven-plugin seems mainly XML-oriented and the
> replacer-plugin would need quite a bit of improvement, if I look at the
> code...

Maven is a software project management and comprehension tool not a
general purpose utility.

As someone has noted you could write your own plugin to cater for this
is needed, but you could also just invoke Ant (from Maven) which has
more utility support like you are wanting.

The other point to note is that if you dont always need a portable
solution. Especially if you are contained to your corporate
environment and you have a known SOE to support.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Grant Rettke <gr...@acm.org>.
On Thu, Apr 23, 2009 at 7:43 AM, Thomas Marti <th...@schweiz.org> wrote:
> I was very suprised that I haven't been able to find plugins to achieve a
> few simple tasks like merging/concating files together, replacing random
> strings (that aren't properties) in resources, and finally sorting files.

Necessity is the mother of invention :)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Thomas Marti <th...@schweiz.org>.
Hi Manos

Manos Batsis wrote:
> Thomas Marti wrote:
> That shouldn't be a problem, even windows can play along using cygwin. 
> If you really need out-of-the-box portability just patch up a custom 
> plugin that does the same through java code.

That's all true. But that's not really the point here...

I was very suprised that I haven't been able to find plugins to achieve a few 
simple tasks like merging/concating files together, replacing random strings 
(that aren't properties) in resources, and finally sorting files.

Now, as for existing plugin for replacing I have found two candidates:
http://www.stephenduncanjr.com/projects/xpathreplacement-maven-plugin/
http://code.google.com/p/maven-replacer-plugin/

But xpathreplacement-maven-plugin seems mainly XML-oriented and the 
replacer-plugin would need quite a bit of improvement, if I look at the code...


Bye, Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Thomas Marti <th...@schweiz.org>.
Argh, was to quick with the send button...

The two maven plugins are:
http://www.stephenduncanjr.com/projects/xpathreplacement-maven-plugin
http://code.google.com/p/maven-replacer-plugin/

But xpathreplacement-maven-plugin seems more XML-oriented and the 
replacer-plugin would need a bit of improvement, if I look at the code...


Bye, Thomas

Manos Batsis wrote:
> Thomas Marti wrote:
> 
>> Now all is well & nice and this script even performs sufficently given 
>> about 1.6
>> million dictionary entries (~38MB). But of course it's not really the 
>> Maven way to do things, especially because it's not portable. You need 
>> to have some kind of Unix-like enviroment in place for this script to 
>> work.
> 
> That shouldn't be a problem, even windows can play along using cygwin. 
> If you really need out-of-the-box portability just patch up a custom 
> plugin that does the same through java code.
> 
> hth,
> 
> Manos
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> For additional commands, e-mail: users-help@maven.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Wayne Fay <wa...@gmail.com>.
> I was just very suprised that I haven't been able to find plugins to achieve
> a few simple tasks like merging/concating files together, replacing random
> strings (that aren't properties) in resources, and finally sorting files.

You're the first person who needed to do this as part of their build,
it would seem...

It doesn't really make sense to me that you'd do this dictionary
merging/sorting every single time you run a build, or even
occasionally such that you'd want a profile, so I'd probably not be
looking for a Maven-solution to this anyway (personally). But I don't
fully understand the requirements here so it is likely that I am
missing something obvious.

Wayne

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Thomas Marti <th...@schweiz.org>.
Hi Manos

Manos Batsis wrote:
> Thomas Marti wrote:
> 
> That shouldn't be a problem, even windows can play along using cygwin. 
> If you really need out-of-the-box portability just patch up a custom 
> plugin that does the same through java code.


That's all true. But that's not really the point here.

I was just very suprised that I haven't been able to find plugins to achieve a 
few simple tasks like merging/concating files together, replacing random strings 
(that aren't properties) in resources, and finally sorting files.

Now, as for existing plugin for replacing I have found two candidates:
http://www.stephenduncanjr.com/projects/xpathreplacement-maven-plugin/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org


Re: Challenge: Find a plugin combo that achieves what 4 lines of bash script can do...

Posted by Manos Batsis <ma...@geekologue.com>.
Thomas Marti wrote:

> Now all is well & nice and this script even performs sufficently given 
> about 1.6
> million dictionary entries (~38MB). But of course it's not really the 
> Maven way to do things, especially because it's not portable. You need 
> to have some kind of Unix-like enviroment in place for this script to work.

That shouldn't be a problem, even windows can play along using cygwin. 
If you really need out-of-the-box portability just patch up a custom 
plugin that does the same through java code.

hth,

Manos

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
For additional commands, e-mail: users-help@maven.apache.org