You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ant.apache.org by Rao Chaudhri <rc...@1fbusa.com> on 2010/07/26 20:22:56 UTC

Remove duplicate JAR file names from an XML file

I have an xml file which lists JAR file names in it with space specified as a delimiter. There are duplications of JAR files name in the file and I was wondering if there is a way to some how get ride of the duplication, to get a file which has a unique set of JAR file names?

 

Example of file:

temp.xml:

antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar commons-validator-1.0.2.jar commons-validator-1.0.2.jar commons-lang-2.2.jar commons-lang-2.2.jar

 

Usman Chaudhri

 


Re: Remove duplicate JAR file names from an XML file

Posted by Michael Ludwig <mi...@gmx.de>.
Rao Chaudhri schrieb am 26.07.2010 um 11:22 (-0700):
> I have an xml file which lists JAR file names in it with space
> specified as a delimiter. There are duplications of JAR files name in
> the file and I was wondering if there is a way to some how get ride of
> the duplication, to get a file which has a unique set of JAR file
> names?

If you happen to have Perl around, you could use <exec> to do:

perl -nale '$j{$_}++ for @F; END {print for sort keys %j}' < jars.txt

* define a structure to reduce duplicates (set or map)
* tokenize input
* store in structure
* print output, possibly sorted

-- 
Michael Ludwig

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


RE: AW: Remove duplicate JAR file names from an XML file

Posted by Rao Chaudhri <rc...@1fbusa.com>.
Did sort of like a similar thing:
I am using string comparison to get rid of duplications, using a for loop to iterate through the list of JAR files and in each iteration I store the name of the JAR file in a temporary variable and than do a comparison of the current property with the temporary variable in the next iteration.

Thanks

Usman Chaudhri

-----Original Message-----
From: Jan.Materne@rzf.fin-nrw.de [mailto:Jan.Materne@rzf.fin-nrw.de] 
Sent: Monday, July 26, 2010 11:33 PM
To: user@ant.apache.org
Subject: AW: Remove duplicate JAR file names from an XML file

>I have an xml file which lists JAR file names in it with space 
>specified as a delimiter. There are duplications of JAR files 
>name in the file and I was wondering if there is a way to some 
>how get ride of the duplication, to get a file which has a 
>unique set of JAR file names?
> 
>
>Example of file:
>
>antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar 
>aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar 
>commons-validator-1.0.2.jar commons-validator-1.0.2.jar 
>commons-lang-2.2.jar commons-lang-2.2.jar


I dont see any built-in possibility so you have to script/program your
own task.
- you could read the xml with <xmlproperty>
- adress the list as property
- split the property value and store it in a set
- change the set into a space delimited list and store it as property
- copy the xml file with replacing ${xml-property} by
${calculated-property}

Just my 5ct ;)


Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by Gilbert Rebhan <gi...@maksimo.de>.
-------- Original Message  --------
Subject: Re: Remove duplicate JAR file names from an XML file
From: Michael Ludwig <mi...@gmx.de>
To: Ant Users List <us...@ant.apache.org>
Date: 31.07.2010 19:19


> Looks like the <union> is not needed in this case as <pathconvert>
> seems to imply removal of duplicates.

yep, you're right :-)

<pathconvert property="uniquejars" pathsep=",">
 <sort>
  <tokens>
   <propertyresource name="alljars"/>
   <stringtokenizer/>
  </tokens>
 </sort>
</pathconvert>

does the job.


Regards, Gilbert






---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by Michael Ludwig <mi...@gmx.de>.
Gilbert Rebhan schrieb am 31.07.2010 um 14:50 (+0200):
> > Gilbert Rebhan schrieb am 31.07.2010 um 00:47 (+0200):

> > Should be using a regex here: getProperty("alljars").split("\\s+");
> > That will take care of linebreaks and tabs, not only spaces.
> 
> not required in that case, as xmltask uses a blank as default

I think that was for the case of a whitespace-separated list of jars
taken from the XML by using the text() node test.

> > <project>
> >   <file file="res.txt" id="input"/>
> >   <union id="tokens">
> >     <sort>
> >       <tokens>
> >         <resources refid="input"/>
> >         <stringtokenizer/>
> >       </tokens>
> >     </sort>
> >   </union>
> >   <pathconvert refid="tokens"
> >     pathsep="${line.separator}"
> >     property="tokens2" />
> >   <echo message="${tokens2}"/>
> > </project>

Looks like the <union> is not needed in this case as <pathconvert>
seems to imply removal of duplicates.

> <pathconvert property="uniquejars" pathsep=",">
>  <union>
>   <sort>
>    <tokens>
>     <propertyresource name="alljars"/>
>     <stringtokenizer/>
>    </tokens>
>   </sort>
>  </union>
> </pathconvert>

Same here, you can drop the <union> and get the same result.

> if you like perl, you may like (j)ruby or groovy
> for scripting in ant <script ../> <scriptdef ../>
> Sometimes it's easier to write a small script before
> using too much clumsy xml or writing a new task.
> 
> Also try ant-flaka [1], which aims to simplify writing ant files.
> i just started using it, and it rocks :-)

> [1] http://code.google.com/p/flaka/

Thanks, looks interesting!

Best,
-- 
Michael Ludwig

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by Gilbert Rebhan <gi...@maksimo.de>.
-------- Original Message  --------
Subject: Re: Remove duplicate JAR file names from an XML file
From: Michael Ludwig <mi...@gmx.de>
To: Ant Users List <us...@ant.apache.org>
Date: 31.07.2010 13:21

> Gilbert Rebhan schrieb am 31.07.2010 um 00:47 (+0200):
> 
>> Whenever some kind of xml processing occurs within your ant workflow
>> i recommend the use of the xmltask[1].
> 
>> From your first posting i assume you have some xml like :
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <jars>
>> <files>antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar
>> aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar
>> commons-validator-1.0.2.jar commons-validator-1.0.2.jar
>> commons-lang-2.2.jar commons-lang-2.2.jar</files>
>> </jars>
> 
> If that is the OP's XML, poor design indeed.

sure, he didn't provide more details than =
/*
I have an xml file which lists JAR file names in it with space
>specified as a delimiter.
*/

> Should be using a regex here: getProperty("alljars").split("\\s+");
> That will take care of linebreaks and tabs, not only spaces.

not required in that case, as xmltask uses a blank as default
delimiter if not specified :
<xmltask source="./jars.xml">
<copy path="//files/text()"
      append="true"
      property="alljars"
/>
</xmltask>


>>   [xmltask] Cannot append values to properties
>>    ... don' get annoyed from those messages, simply ignore
>>    or do a search in the xmlproperty task sources and comment it out
> 
> Why is it there in the first place? Appending to a property seems to
> work just fine, at least in this case.

from what i believe it's a warning because of backward compatibility,
should be in logged in debuglevel only.
Appending to Property works fine.

> While I like XML, I think it's overkill for a list of items. Here's a
> plain text example:

Maybe it's not his choice as he get's xml from an external tool !?

> michael@wladimir:~ :-) expand -t2 res.xml
> <project>
>   <file file="res.txt" id="input"/>
>   <union id="tokens">
>     <sort>
>       <tokens>
>         <resources refid="input"/>
>         <stringtokenizer/>
>       </tokens>
>     </sort>
>   </union>
>   <pathconvert refid="tokens"
>     pathsep="${line.separator}"
>     property="tokens2" />
>   <echo message="${tokens2}"/>
> </project>
> 
> michael@wladimir:~ :-) ant -f res.xml 
> Buildfile: T:\cygwin\home\michael\res.xml
>      [echo] antlr-2.7.6.jar
>      [echo] aopalliance-1.0.jar
>      [echo] commons-lang-2.2.jar
>      [echo] commons-validator-1.0.2.jar
> 
> BUILD SUCCESSFUL
> Total time: 0 seconds
> -------------------------
> 
> Can any of it be simplified further?

yep, you're right, i forgot rescources as usual,
had been with ant 1,6,5 much too long ;-)

i would write it that way =

<?xml version="1.0" encoding="UTF-8"?>
<project>
<!-- Import XMLTask -->
 <taskdef name="xmltask"
 classname="com.oopsconsultancy.xmltask.ant.XmlTask"/>

<target name="depends">
<xmltask source="./jars.xml">
<copy path="//files/text()"
      append="true"
      property="alljars"
/>
</xmltask>

<pathconvert property="uniquejars" pathsep=",">
 <union>
  <sort>
   <tokens>
    <propertyresource name="alljars"/>
    <stringtokenizer/>
   </tokens>
  </sort>
 </union>
</pathconvert>

</target>

<target name="main" depends="depends">
 <echo>$${uniquejars} = ${uniquejars}</echo>
 <echo>$${alljars} = ${alljars}</echo>
</target>
</project>

-----------------------------------

if you like perl, you may like (j)ruby or groovy
for scripting in ant <script ../> <scriptdef ../>
Sometimes it's easier to write a small script before
using too much clumsy xml or writing a new task.

Also try ant-flaka [1], which aims to simplify writing ant files.
i just started using it, and it rocks :-)


Regards, Gilbert


[1] http://code.google.com/p/flaka/


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by Michael Ludwig <mi...@gmx.de>.
Gilbert Rebhan schrieb am 31.07.2010 um 00:47 (+0200):

> Whenever some kind of xml processing occurs within your ant workflow
> i recommend the use of the xmltask[1].

> From your first posting i assume you have some xml like :
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <jars>
> <files>antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar
> aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar
> commons-validator-1.0.2.jar commons-validator-1.0.2.jar
> commons-lang-2.2.jar commons-lang-2.2.jar</files>
> </jars>

If that is the OP's XML, poor design indeed.

> 1) use xmltask/xpath to get a list of filenames and <script ../>
> afterwards to get your distinct list.

>   <script language="beanshell">
>   <![CDATA[
>     String[] jars = project.getProperty("alljars").split(" ");

Should be using a regex here: getProperty("alljars").split("\\s+");

That will take care of linebreaks and tabs, not only spaces.

>      [echo] ${uniquejars} = [antlr-2.7.6.jar, commons-lang-2.2.jar
>      [echo] , aopalliance-1.0.jar, commons-lang-2.2.jar,
> commons-validator-1.0.2.jar]

Duplicates will then be removed.

> 2) take influence on the creation of the xml file if possible,
> and create a structure that is more xpath suitable and simply make use
> of xpath

Much better approach if you want to use XML!

>   [xmltask] Cannot append values to properties
>    ... don' get annoyed from those messages, simply ignore
>    or do a search in the xmlproperty task sources and comment it out

Why is it there in the first place? Appending to a property seems to
work just fine, at least in this case.

While I like XML, I think it's overkill for a list of items. Here's a
plain text example:

          \,,,/
          (o o)
------oOOo-(_)-oOOo------
michael@wladimir:~ :-) cat res.txt 
antlr-2.7.6.jar antlr-2.7.6.jar
antlr-2.7.6.jar aopalliance-1.0.jar
aopalliance-1.0.jar aopalliance-1.0.jar
commons-validator-1.0.2.jar commons-validator-1.0.2.jar
commons-lang-2.2.jar commons-lang-2.2.jar

michael@wladimir:~ :-) expand -t2 res.xml
<project>
  <file file="res.txt" id="input"/>
  <union id="tokens">
    <sort>
      <tokens>
        <resources refid="input"/>
        <stringtokenizer/>
      </tokens>
    </sort>
  </union>
  <pathconvert refid="tokens"
    pathsep="${line.separator}"
    property="tokens2" />
  <echo message="${tokens2}"/>
</project>

michael@wladimir:~ :-) ant -f res.xml 
Buildfile: T:\cygwin\home\michael\res.xml
     [echo] antlr-2.7.6.jar
     [echo] aopalliance-1.0.jar
     [echo] commons-lang-2.2.jar
     [echo] commons-validator-1.0.2.jar

BUILD SUCCESSFUL
Total time: 0 seconds
-------------------------

Can any of it be simplified further?

Maybe it's just me, but I've been struggling with the Ant documentation
to find out how to achieve this. It's so simple in Perl, rather
complicated in Java, and non-obvious in Ant.

When I learnt programming Perl (first language) a couple years ago,
there was a key moment when I understood that you have to grasp each
operation in terms of its input and output, which will allow you to
combine them elegantly and seamlessly:

  print map "$_\n", sort keys %j;

It's about knowing how the different pieces can be combined. I think in
language design this is referred to as composability, usually seen as a
good thing.

With Ant, I'm frequently unsure how to combine things. Well, it's still
not entirely clear, but this doc helped me:

http://ant.apache.org/manual/Types/resources.html

-- 
Michael Ludwig

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by Gilbert Rebhan <gi...@maksimo.de>.
-------- Original Message  --------
Subject: Re: Remove duplicate JAR file names from an XML file
From: <Ja...@rzf.fin-nrw.de>
To: user@ant.apache.org
Date: 27.07.2010 08:32

>> I have an xml file which lists JAR file names in it with space 
>> specified as a delimiter. There are duplications of JAR files 
>> name in the file and I was wondering if there is a way to some 
>> how get ride of the duplication, to get a file which has a 
>> unique set of JAR file names?
>>
>>
>> Example of file:
>>
>> antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar 
>> aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar 
>> commons-validator-1.0.2.jar commons-validator-1.0.2.jar 
>> commons-lang-2.2.jar commons-lang-2.2.jar


Whenever some kind of xml processing occurs within your ant workflow
i recommend the use of the xmltask[1].

Two solutions for your problem
>From your first posting i assume you have some xml like :

<?xml version="1.0" encoding="UTF-8"?>
<jars>
<files>antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar
aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar
commons-validator-1.0.2.jar commons-validator-1.0.2.jar
commons-lang-2.2.jar commons-lang-2.2.jar</files>
</jars>

1) use xmltask/xpath to get a list of filenames and <script ../>
afterwards to get your distinct list.

<?xml version="1.0" encoding="UTF-8"?>
<project>
<!-- Import XMLTask -->
<taskdef name="xmltask"
classname="com.oopsconsultancy.xmltask.ant.XmlTask"/>

<target name="depends">
  <xmltask source="./jars.xml">
    <copy path="//files/text()"
          append="true"
          property="alljars"
    />
  </xmltask>
	
	<echo>${alljars}</echo>
	
  <script language="beanshell">
  <![CDATA[
    String[] jars = project.getProperty("alljars").split(" ");
    Set set = new HashSet(Arrays.asList(jars));
    String[] distinct = (set.toArray(new String[set.size()]));
    project.setProperty("uniquejars", Arrays.toString(distinct));
  ]]>
  </script>

</target>

<target name="main" depends="depends">
  <echo>$${uniquejars} = ${uniquejars}${line.separator}</echo>
  <echo>$${alljars} = ${alljars}</echo>
</target>
</project>

Buildfile: /workspace/ant/foobar.xml
depends:
  [xmltask] Cannot append values to properties
     [echo] antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar
aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar
commons-validator-1.0.2.jar commons-validator-1.0.2.jar
commons-lang-2.2.jar commons-lang-2.2.jar
main:
     [echo] ${uniquejars} = [antlr-2.7.6.jar, commons-lang-2.2.jar
     [echo] , aopalliance-1.0.jar, commons-lang-2.2.jar,
commons-validator-1.0.2.jar]
     [echo] ${alljars} = antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar
aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar
commons-validator-1.0.2.jar commons-validator-1.0.2.jar
commons-lang-2.2.jar commons-lang-2.2.jar
BUILD SUCCESSFUL
Total time: 902 milliseconds


2) take influence on the creation of the xml file if possible,
and create a structure that is more xpath suitable and simply make use
of xpath, f.e.

<?xml version="1.0" encoding="UTF-8"?>
<jars>
<file>antlr-2.7.6.jar</file>
<file>aopalliance-1.0.jar</file>
<file>antlr-2.7.6.jar</file>
<file>commons-validator-1.0.2.jar</file>
<file>commons-validator-1.0.2.jar</file>
<file>commons-lang-2.2.jar</file>
<file>whatever-0.0.1</file>
<file>commons-lang-2.2.jar</file>
<file>whatever-0.0.1</file>
<file>aopalliance-1.0.jar</file>
</jars>

with some xpath[2] tricks (sadly XPATH 2.0 => distinct-value(...)
doesn't work) :

<?xml version="1.0" encoding="UTF-8"?>
<project>
<!-- Import XMLTask -->
<taskdef name="xmltask"
classname="com.oopsconsultancy.xmltask.ant.XmlTask"/>

<target name="depends">
  <xmltask source="./jars.xml">
    <copy path="//file/text()"
          append="true"
     	    property="alljars"
          propertyseparator="${line.separator}"
    />
    <copy path="//file/text()[not(preceding::file/text() = .)]"
    	    append="true"
     	    property="jars"
          propertyseparator="${line.separator}"
    />
  </xmltask>
</target>

<target name="main" depends="depends">
  <echo>--- $${alljars} ---${line.separator}${alljars}</echo>
  <echo>${line.separator}--- $${jars} ---${line.separator}${jars}</echo>
</target>
</project>

you'll get your distinct list :
Buildfile: /workspace/ant/foobar.xml
depends:
  [xmltask] Cannot append values to properties
   ... don' get annoyed from those messages, simply ignore
   or do a search in the xmlproperty task sources and comment it out

main:
     [echo] --- ${alljars} ---
     [echo] antlr-2.7.6.jar
     [echo] aopalliance-1.0.jar
     [echo] antlr-2.7.6.jar
     [echo] commons-validator-1.0.2.jar
     [echo] commons-validator-1.0.2.jar
     [echo] commons-lang-2.2.jar
     [echo] whatever-0.0.1
     [echo] commons-lang-2.2.jar
     [echo] whatever-0.0.1
     [echo] aopalliance-1.0.jar
     [echo]
     [echo] --- ${jars} ---
     [echo] antlr-2.7.6.jar
     [echo] aopalliance-1.0.jar
     [echo] commons-validator-1.0.2.jar
     [echo] commons-lang-2.2.jar
     [echo] whatever-0.0.1
BUILD SUCCESSFUL
Total time: 603 milliseconds



[1] http://www.oopsconsultancy.com/software/xmltask/
[2] http://www.zvon.org/xxl/XPathTutorial/General/examples.html


Regards, Gilbert




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


AW: Remove duplicate JAR file names from an XML file

Posted by Ja...@rzf.fin-nrw.de.
>I have an xml file which lists JAR file names in it with space 
>specified as a delimiter. There are duplications of JAR files 
>name in the file and I was wondering if there is a way to some 
>how get ride of the duplication, to get a file which has a 
>unique set of JAR file names?
> 
>
>Example of file:
>
>antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar 
>aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar 
>commons-validator-1.0.2.jar commons-validator-1.0.2.jar 
>commons-lang-2.2.jar commons-lang-2.2.jar


I dont see any built-in possibility so you have to script/program your
own task.
- you could read the xml with <xmlproperty>
- adress the list as property
- split the property value and store it in a set
- change the set into a space delimited list and store it as property
- copy the xml file with replacing ${xml-property} by
${calculated-property}

Just my 5ct ;)


Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Re: Remove duplicate JAR file names from an XML file

Posted by David Weintraub <qa...@gmail.com>.
On Mon, Jul 26, 2010 at 2:22 PM, Rao Chaudhri <rc...@1fbusa.com> wrote:
> I have an xml file which lists JAR file names in it with space specified as a delimiter. There are duplications of JAR files name in the file and I was wondering if there is a way to some how get ride of the duplication, to get a file which has a unique set of JAR file names?
>
>
>
> Example of file:
>
> temp.xml:
>
> antlr-2.7.6.jar antlr-2.7.6.jar antlr-2.7.6.jar aopalliance-1.0.jar aopalliance-1.0.jar aopalliance-1.0.jar commons-validator-1.0.2.jar commons-validator-1.0.2.jar commons-lang-2.2.jar commons-lang-2.2.jar

I'm confused since you really didn't specify exactly what it is:

* Can you show some of the structure of the XML file, so I can look at
what has to
  be parsed.
* Is this a generated list at build time, or is this something that is
checked into
  your repository?
* If this is a generated list at build time, exactly where are you
getting the information
  for this XML?

This will help me understand the possible issues involved. It would be
better if we could correct the duplication before they were put into
the XML file instead of parsing them out afterwards.

-- 
David Weintraub
qazwart@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org