You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Patrick Diviacco <pa...@gmail.com> on 2011/04/05 13:08:37 UTC

[digester] digester performance..

hi,

I've a java app and I've stopped to use Digester recently because all my
data is now kept in RAM and I don't need to write/parse xml files anymore.

However, since I don't use Digester and external xml files, the performance
of my app got worse.

I now have the same data stored in a ArrayList<ArrayList<String>> and I'm
iterate them with a for cycle.

Before they were in a xml file with the following structure:

<collection>
<doc>
<field1></field1>
..
</doc>
..
</collection>

Is really Digester much faster in iterating my data from xml file than a for
loop iterating an ArrayList with the same content?

thanks

Re: [digester] digester performance..

Posted by Simone Tripodi <si...@apache.org>.

I'd discourage XPath since implies maintaining the DOM in memory, if
the XML document Patrick is parsing is large is thousand and thousand
of Megabytes, XPath is not efficient as well.
Patrick, honestly I didn't understand the problem :) sounds a Lucene
performance problem, did you already try writing on Lucene ML?
Simo

http://people.apache.org/~simonetripodi/
http://www.99soft.org/



On Tue, Apr 5, 2011 at 11:24 PM, Jimmy Zhang <cr...@comcast.net> wrote:
> Have you considered using xpath instead of digester?
>
> -----Original Message----- From: Patrick Diviacco
> Sent: Tuesday, April 05, 2011 4:08 AM
> To: Commons Users List
> Subject: [digester] digester performance..
>
> hi,
>
> I've a java app and I've stopped to use Digester recently because all my
> data is now kept in RAM and I don't need to write/parse xml files anymore.
>
> However, since I don't use Digester and external xml files, the performance
> of my app got worse.
>
> I now have the same data stored in a ArrayList<ArrayList<String>> and I'm
> iterate them with a for cycle.
>
> Before they were in a xml file with the following structure:
>
> <collection>
> <doc>
> <field1></field1>
> ..
> </doc>
> ..
> </collection>
>
> Is really Digester much faster in iterating my data from xml file than a for
> loop iterating an ArrayList with the same content?
>
> thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [digester] digester performance..

Posted by Jimmy Zhang <cr...@comcast.net>.

Have you considered using xpath instead of digester?

-----Original Message----- 
From: Patrick Diviacco
Sent: Tuesday, April 05, 2011 4:08 AM
To: Commons Users List
Subject: [digester] digester performance..

hi,

I've a java app and I've stopped to use Digester recently because all my
data is now kept in RAM and I don't need to write/parse xml files anymore.

However, since I don't use Digester and external xml files, the performance
of my app got worse.

I now have the same data stored in a ArrayList<ArrayList<String>> and I'm
iterate them with a for cycle.

Before they were in a xml file with the following structure:

<collection>
<doc>
<field1></field1>
..
</doc>
..
</collection>

Is really Digester much faster in iterating my data from xml file than a for
loop iterating an ArrayList with the same content?

thanks 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [digester] digester performance..

Posted by Patrick Diviacco <pa...@gmail.com>.

HI, no it is not the same program.
I'm basically calling the method below in a for loop.

In my first app I invoked it only once over the entire index (30 rows), and
it took 2 minutes.

Now I'm calling it in loop for each row, because I need to update my index,
which is growing (first iteration 1 row, then 2... then 2 again, then 3...
and so on -it is a clustering algorithm and each row is a cluster).

It is supposed to be slow but I'm surpise it takes more than 1 hour.
Thanks


public static void performQuery(QueryDoc queryDoc) throws
java.io.IOException
{

BooleanQuery booleanQuery = new BooleanQuery(true);

notRelevant = new MatchAllDocsQuery();
booleanQuery.add(notRelevant, BooleanClause.Occur.SHOULD);

try {

phrase = queryDoc.getTitle();
for (int i = 0; i < phrase.length; i++) {
title = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"title", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

phrase = queryDoc.getDescription();
for (int i = 0; i < phrase.length; i++) {
description = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"description", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

//time = new TermQuery(new Term("time",queryDoc.getTime()));
//booleanQuery.add(time, BooleanClause.Occur.SHOULD);

phrase = queryDoc.getTags();
for (int i = 0; i < phrase.length; i++) {
tags = new BooleanQuery();
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"tags", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]),
BooleanClause.Occur.SHOULD);
}

} catch (ParseException pe) {
//System.out.println(pe.getMessage());

}

topDocs = searcher.search(booleanQuery, 220000);
writeResults(topDocs, queryDoc);


}

On 5 April 2011 15:45, Simone Tripodi <si...@apache.org> wrote:

> Hi Patrick,
> if the Digester program you're speaking about is the one you pasted
> here time ago... well, there were a lot of optimization missed. For
> example I suggested you using the Lucene rules instead of storing all
> the properties in a POJO then creating the Lucene Document, in that
> way you limit the amount of stored data.
>
> When parsing large XML document - like your case - I suggest you
> mapping to Object as less as possible and stream more.
>
> HTH,
> Simo
>
> http://people.apache.org/~simonetripodi/
> http://www.99soft.org/
>
>
>
> 2011/4/5 Weiwei Wang <ww...@gmail.com>:
> > I don't not think your program becomes slower because you are not using
> > Digester, RAM should be much faster. Suggest you make your main part of
> your
> > program simple and paste it in the email so as others can help
> >
> > On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com
> >> wrote:
> >
> >> hi,
> >>
> >> I've a java app and I've stopped to use Digester recently because all my
> >> data is now kept in RAM and I don't need to write/parse xml files
> anymore.
> >>
> >> However, since I don't use Digester and external xml files, the
> performance
> >> of my app got worse.
> >>
> >> I now have the same data stored in a ArrayList<ArrayList<String>> and
> I'm
> >> iterate them with a for cycle.
> >>
> >> Before they were in a xml file with the following structure:
> >>
> >> <collection>
> >> <doc>
> >> <field1></field1>
> >> ..
> >> </doc>
> >> ..
> >> </collection>
> >>
> >> Is really Digester much faster in iterating my data from xml file than a
> >> for
> >> loop iterating an ArrayList with the same content?
> >>
> >> thanks
> >>
> >
> >
> >
> > --
> > 王巍巍
> > Cell: 18911288489
> > MSN: ww.wang.cs@gmail.com
> > Blog: http://whisper.eyesay.org
> > 围脖:http://t.sina.com/lolorosa
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Re: [digester] digester performance..

Posted by Simone Tripodi <si...@apache.org>.

Hi Patrick,
if the Digester program you're speaking about is the one you pasted
here time ago... well, there were a lot of optimization missed. For
example I suggested you using the Lucene rules instead of storing all
the properties in a POJO then creating the Lucene Document, in that
way you limit the amount of stored data.

When parsing large XML document - like your case - I suggest you
mapping to Object as less as possible and stream more.

HTH,
Simo

http://people.apache.org/~simonetripodi/
http://www.99soft.org/



2011/4/5 Weiwei Wang <ww...@gmail.com>:
> I don't not think your program becomes slower because you are not using
> Digester, RAM should be much faster. Suggest you make your main part of your
> program simple and paste it in the email so as others can help
>
> On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco <patrick.diviacco@gmail.com
>> wrote:
>
>> hi,
>>
>> I've a java app and I've stopped to use Digester recently because all my
>> data is now kept in RAM and I don't need to write/parse xml files anymore.
>>
>> However, since I don't use Digester and external xml files, the performance
>> of my app got worse.
>>
>> I now have the same data stored in a ArrayList<ArrayList<String>> and I'm
>> iterate them with a for cycle.
>>
>> Before they were in a xml file with the following structure:
>>
>> <collection>
>> <doc>
>> <field1></field1>
>> ..
>> </doc>
>> ..
>> </collection>
>>
>> Is really Digester much faster in iterating my data from xml file than a
>> for
>> loop iterating an ArrayList with the same content?
>>
>> thanks
>>
>
>
>
> --
> 王巍巍
> Cell: 18911288489
> MSN: ww.wang.cs@gmail.com
> Blog: http://whisper.eyesay.org
> 围脖:http://t.sina.com/lolorosa
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [digester] digester performance..

Posted by Weiwei Wang <ww...@gmail.com>.

I don't not think your program becomes slower because you are not using
Digester, RAM should be much faster. Suggest you make your main part of your
program simple and paste it in the email so as others can help

On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco <patrick.diviacco@gmail.com
> wrote:

> hi,
>
> I've a java app and I've stopped to use Digester recently because all my
> data is now kept in RAM and I don't need to write/parse xml files anymore.
>
> However, since I don't use Digester and external xml files, the performance
> of my app got worse.
>
> I now have the same data stored in a ArrayList<ArrayList<String>> and I'm
> iterate them with a for cycle.
>
> Before they were in a xml file with the following structure:
>
> <collection>
> <doc>
> <field1></field1>
> ..
> </doc>
> ..
> </collection>
>
> Is really Digester much faster in iterating my data from xml file than a
> for
> loop iterating an ArrayList with the same content?
>
> thanks
>



-- 
王巍巍
Cell: 18911288489
MSN: ww.wang.cs@gmail.com
Blog: http://whisper.eyesay.org
围脖:http://t.sina.com/lolorosa