You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Sudarsan, Sithu D." <Si...@fda.hhs.gov> on 2009/05/21 16:42:59 UTC
Parsing large xml files
Hi,
While trying to parse xml documents of about 50MB size, we run into
OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
(that is the max), does not help. Is there any API that could be used to
handle such large single xml files?
If Lucene is not the right place, please let me know alternate places to
look for,
Thanks in advance,
Sithu D Sudarsan
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu
Re: Parsing large xml files
Posted by Erick Erickson <er...@gmail.com>.
What fails and what is the stack trace? Have you tried just
parsing the XML in a stand-alone program independent of
indexing?
You should easily be able to parse a 50MB file with that much
memory. I suspect something else is going on here. Perhaps you're
not *really* allocating that much memory to the process. If you're
working in an IDE for instance you could be allocating memory to the
IDE but not setting the correct runtime parameters for programs
run within that IDE.
If that is irrelevant, perhaps you could add more details...
Best
Erick
On Thu, May 21, 2009 at 10:42 AM, Sudarsan, Sithu D. <
Sithu.Sudarsan@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
RE: Parsing large xml files
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
Thanks, I'll try that and get back to you
Sincerely,
Sithu D Sudarsan
-----Original Message-----
From: Michael Barbarelli [mailto:mbarbarelli@gmail.com]
Sent: Thursday, May 21, 2009 10:52 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
Why not use an XML pull parser? I recommend against using an in-memory
parser.
On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. <
Sithu.Sudarsan@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close
2GB
> (that is the max), does not help. Is there any API that could be used
to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places
to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by Joel Halbert <jo...@su3analytics.com>.
try http://piccolo.sourceforge.net/
is small and fast.
-----Original Message-----
From: Michael Barbarelli <mb...@gmail.com>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
Date: Thu, 21 May 2009 15:52:00 +0100
Why not use an XML pull parser? I recommend against using an in-memory
parser.
On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. <
Sithu.Sudarsan@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by Michael Barbarelli <mb...@gmail.com>.
Why not use an XML pull parser? I recommend against using an in-memory
parser.
On Thu, May 21, 2009 at 3:42 PM, Sudarsan, Sithu D. <
Sithu.Sudarsan@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
Re: Parsing large xml files
Posted by prasanna pradhan <pr...@gmail.com>.
We had similar a problem where we had to parse 1 GB XML files.Better
transform to array like json and write a custom search API using lucene.
On Thu, May 21, 2009 at 8:12 PM, Sudarsan, Sithu D. <
Sithu.Sudarsan@fda.hhs.gov> wrote:
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
--
Thanks,
Prasanna
Re: Parsing large xml files
Posted by Matthew Hall <mh...@informatics.jax.org>.
Yeah, there's a setting on windows that allows you to use up to .. erm
3G I think it was. The limitation there is due to the silly windows
file system. I'm don't remember off hand exactly what that setting was,
but I'm 100% certain that its there.
If you do a google search for jvm maximum memory settings on windows you
should be able to find a few articles about it.
(At least that's certainly my recollection)
Secondly, if you have a linux machine available you should likely just
use that, particularly if its a 64 bit processor because then a whole
ton more memory becomes available to you.
When I'm developing my indexes I do it via eclipse on my windows
platform, but with the actual directories themselves mounted from a
solaris machine. When I go to actually MAKE the indexes I simply login
to the machine do a quick ant compile, and run them. Sure its an extra
step, but the gains are more than worth it in our case.
Matt
Sudarsan, Sithu D. wrote:
>
> Hi Matt,
>
> We use 32 bit JVM. Though it is supposed to have upto 4GB, any
> assignment above 2GB in Windows XP fails. The machine has quad-core
> dual processor.
>
> On Linux we're able to use 4GB though!
>
> If there is any setting that will let us use 4GB do let me know.
>
> Thanks,
> Sithu D Sudarsan
>
> -----Original Message-----
> From: Matthew Hall [mailto:mhall@informatics.jax.org]
> Sent: Friday, May 22, 2009 8:59 AM
> To: java-user@lucene.apache.org
> Subject: Re: Parsing large xml files
>
> 2g... should not be a maximum for any Jvm that I know of.
>
> Assuming you are running a 32 bit Jvm you are actually able to address a
>
> bit under 4G of memory, I've always used around 3.6G when trying to max
> out a 32 bit jvm. Technically speaking it should be able to address 4g
> under a 32 bit or, however a certain percentage of the memory is set
> aside for overhead, so you can only really use a bit less than the max.
>
> If you have a 64 bit os/jvm (which you likely might), you can use the
> -d64 setting for your runtime environment to set your maximum memory
> much.. MUCH higher, for example we regularly use 6G of memory on our
> application servers here at the lab.
>
> Hope this helps you a bit,
>
> Matt
>
> crackeur@comcast.net wrote:
>
>> http://vtd-xml.sf.net
>>
>>
>> ----- Original Message -----
>> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
>> Subject: Parsing large xml files
>>
>>
>> Hi,
>>
>> While trying to parse xml documents of about 50MB size, we run into
>> OutOfMemoryError due to java heap space. Increasing JVM to use close
>>
> 2GB
>
>> (that is the max), does not help. Is there any API that could be used
>>
> to
>
>> handle such large single xml files?
>>
>> If Lucene is not the right place, please let me know alternate places
>>
> to
>
>> look for,
>>
>> Thanks in advance,
>> Sithu D Sudarsan
>> sithu.sudarsan@fda.hhs.gov
>> sdsudarsan@ualr.edu
>>
>>
>>
>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Parsing large xml files
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
Hi Matt,
We use 32 bit JVM. Though it is supposed to have upto 4GB, any
assignment above 2GB in Windows XP fails. The machine has quad-core
dual processor.
On Linux we're able to use 4GB though!
If there is any setting that will let us use 4GB do let me know.
Thanks,
Sithu D Sudarsan
-----Original Message-----
From: Matthew Hall [mailto:mhall@informatics.jax.org]
Sent: Friday, May 22, 2009 8:59 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
2g... should not be a maximum for any Jvm that I know of.
Assuming you are running a 32 bit Jvm you are actually able to address a
bit under 4G of memory, I've always used around 3.6G when trying to max
out a 32 bit jvm. Technically speaking it should be able to address 4g
under a 32 bit or, however a certain percentage of the memory is set
aside for overhead, so you can only really use a bit less than the max.
If you have a 64 bit os/jvm (which you likely might), you can use the
-d64 setting for your runtime environment to set your maximum memory
much.. MUCH higher, for example we regularly use 6G of memory on our
application servers here at the lab.
Hope this helps you a bit,
Matt
crackeur@comcast.net wrote:
> http://vtd-xml.sf.net
>
>
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
> Subject: Parsing large xml files
>
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close
2GB
> (that is the max), does not help. Is there any API that could be used
to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places
to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by Matthew Hall <mh...@informatics.jax.org>.
2g... should not be a maximum for any Jvm that I know of.
Assuming you are running a 32 bit Jvm you are actually able to address a
bit under 4G of memory, I've always used around 3.6G when trying to max
out a 32 bit jvm. Technically speaking it should be able to address 4g
under a 32 bit or, however a certain percentage of the memory is set
aside for overhead, so you can only really use a bit less than the max.
If you have a 64 bit os/jvm (which you likely might), you can use the
-d64 setting for your runtime environment to set your maximum memory
much.. MUCH higher, for example we regularly use 6G of memory on our
application servers here at the lab.
Hope this helps you a bit,
Matt
crackeur@comcast.net wrote:
> http://vtd-xml.sf.net
>
>
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
> Subject: Parsing large xml files
>
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by cr...@comcast.net.
yes, that is something worth thinking about .... thanks for bringing this up...
----- Original Message -----
From: "Michael Wechner" <mi...@wyona.com>
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific
Subject: Re: Parsing large xml files
crackeur@comcast.net schrieb:
> once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX...
>
maybe you want to consider to contribute a vtd-xml based parsing
implementation to Lucene ;-)
Thanks
Michael
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
> Subject: RE: Parsing large xml files
>
> Thanks everyone for your useful suggestions/links.
>
> Lucene uses DOM and we tried with SAX.
>
> XML Pull & vtd-xml as well as Piccolo seem good.
>
> However, for now, we've broken the file into smaller chunks and then
> parsing it.
>
> When we get some time, we'ld like to refactor with the suggested ones.
>
> Erick: We do use Eclipse. But running from CLI gives the same error! May
> be there is a way to address the memory issues, but the current idea of
> breaking into smaller chunks have worked for now...
>
>
> Sincerely,
> Sithu D Sudarsan
>
> -----Original Message-----
> From: Michael Wechner [mailto:michael.wechner@wyona.com]
> Sent: Friday, May 22, 2009 4:48 AM
> To: java-user@lucene.apache.org
> Subject: Re: Parsing large xml files
>
> crackeur@comcast.net schrieb:
>
>> http://vtd-xml.sf.net
>>
>>
>> ----- Original Message -----
>> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
>> Subject: Parsing large xml files
>>
>>
>> Hi,
>>
>> While trying to parse xml documents of about 50MB size, we run into
>> OutOfMemoryError due to java heap space. Increasing JVM to use close
>>
> 2GB
>
>> (that is the max), does not help. Is there any API that could be used
>>
> to
>
>> handle such large single xml files?
>>
>>
>
> I am not familiar with that particular code of Lucene, but is it
> possible that Lucene is using DOM for this parsing?
> If so, one could try to replace it by SAX, and hence get rid of the
> OutOfMemory issue.
>
> Cheers
>
> Michael
>
>> If Lucene is not the right place, please let me know alternate places
>>
> to
>
>> look for,
>>
>> Thanks in advance,
>> Sithu D Sudarsan
>> sithu.sudarsan@fda.hhs.gov
>> sdsudarsan@ualr.edu
>>
>>
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by Michael Wechner <mi...@wyona.com>.
crackeur@comcast.net schrieb:
> once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX...
>
maybe you want to consider to contribute a vtd-xml based parsing
implementation to Lucene ;-)
Thanks
Michael
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
> Subject: RE: Parsing large xml files
>
> Thanks everyone for your useful suggestions/links.
>
> Lucene uses DOM and we tried with SAX.
>
> XML Pull & vtd-xml as well as Piccolo seem good.
>
> However, for now, we've broken the file into smaller chunks and then
> parsing it.
>
> When we get some time, we'ld like to refactor with the suggested ones.
>
> Erick: We do use Eclipse. But running from CLI gives the same error! May
> be there is a way to address the memory issues, but the current idea of
> breaking into smaller chunks have worked for now...
>
>
> Sincerely,
> Sithu D Sudarsan
>
> -----Original Message-----
> From: Michael Wechner [mailto:michael.wechner@wyona.com]
> Sent: Friday, May 22, 2009 4:48 AM
> To: java-user@lucene.apache.org
> Subject: Re: Parsing large xml files
>
> crackeur@comcast.net schrieb:
>
>> http://vtd-xml.sf.net
>>
>>
>> ----- Original Message -----
>> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
>> To: java-user@lucene.apache.org
>> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
>> Subject: Parsing large xml files
>>
>>
>> Hi,
>>
>> While trying to parse xml documents of about 50MB size, we run into
>> OutOfMemoryError due to java heap space. Increasing JVM to use close
>>
> 2GB
>
>> (that is the max), does not help. Is there any API that could be used
>>
> to
>
>> handle such large single xml files?
>>
>>
>
> I am not familiar with that particular code of Lucene, but is it
> possible that Lucene is using DOM for this parsing?
> If so, one could try to replace it by SAX, and hence get rid of the
> OutOfMemory issue.
>
> Cheers
>
> Michael
>
>> If Lucene is not the right place, please let me know alternate places
>>
> to
>
>> look for,
>>
>> Thanks in advance,
>> Sithu D Sudarsan
>> sithu.sudarsan@fda.hhs.gov
>> sdsudarsan@ualr.edu
>>
>>
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by cr...@comcast.net.
once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX...
----- Original Message -----
From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
To: java-user@lucene.apache.org
Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific
Subject: RE: Parsing large xml files
Thanks everyone for your useful suggestions/links.
Lucene uses DOM and we tried with SAX.
XML Pull & vtd-xml as well as Piccolo seem good.
However, for now, we've broken the file into smaller chunks and then
parsing it.
When we get some time, we'ld like to refactor with the suggested ones.
Erick: We do use Eclipse. But running from CLI gives the same error! May
be there is a way to address the memory issues, but the current idea of
breaking into smaller chunks have worked for now...
Sincerely,
Sithu D Sudarsan
-----Original Message-----
From: Michael Wechner [mailto:michael.wechner@wyona.com]
Sent: Friday, May 22, 2009 4:48 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
crackeur@comcast.net schrieb:
> http://vtd-xml.sf.net
>
>
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
> Subject: Parsing large xml files
>
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close
2GB
> (that is the max), does not help. Is there any API that could be used
to
> handle such large single xml files?
>
I am not familiar with that particular code of Lucene, but is it
possible that Lucene is using DOM for this parsing?
If so, one could try to replace it by SAX, and hence get rid of the
OutOfMemory issue.
Cheers
Michael
> If Lucene is not the right place, please let me know alternate places
to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Parsing large xml files
Posted by "Sudarsan, Sithu D." <Si...@fda.hhs.gov>.
Thanks everyone for your useful suggestions/links.
Lucene uses DOM and we tried with SAX.
XML Pull & vtd-xml as well as Piccolo seem good.
However, for now, we've broken the file into smaller chunks and then
parsing it.
When we get some time, we'ld like to refactor with the suggested ones.
Erick: We do use Eclipse. But running from CLI gives the same error! May
be there is a way to address the memory issues, but the current idea of
breaking into smaller chunks have worked for now...
Sincerely,
Sithu D Sudarsan
-----Original Message-----
From: Michael Wechner [mailto:michael.wechner@wyona.com]
Sent: Friday, May 22, 2009 4:48 AM
To: java-user@lucene.apache.org
Subject: Re: Parsing large xml files
crackeur@comcast.net schrieb:
> http://vtd-xml.sf.net
>
>
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
> Subject: Parsing large xml files
>
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close
2GB
> (that is the max), does not help. Is there any API that could be used
to
> handle such large single xml files?
>
I am not familiar with that particular code of Lucene, but is it
possible that Lucene is using DOM for this parsing?
If so, one could try to replace it by SAX, and hence get rid of the
OutOfMemory issue.
Cheers
Michael
> If Lucene is not the right place, please let me know alternate places
to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by Michael Wechner <mi...@wyona.com>.
crackeur@comcast.net schrieb:
> http://vtd-xml.sf.net
>
>
> ----- Original Message -----
> From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
> Subject: Parsing large xml files
>
>
> Hi,
>
> While trying to parse xml documents of about 50MB size, we run into
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
> (that is the max), does not help. Is there any API that could be used to
> handle such large single xml files?
>
I am not familiar with that particular code of Lucene, but is it
possible that Lucene is using DOM for this parsing?
If so, one could try to replace it by SAX, and hence get rid of the
OutOfMemory issue.
Cheers
Michael
> If Lucene is not the right place, please let me know alternate places to
> look for,
>
> Thanks in advance,
> Sithu D Sudarsan
> sithu.sudarsan@fda.hhs.gov
> sdsudarsan@ualr.edu
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Parsing large xml files
Posted by cr...@comcast.net.
http://vtd-xml.sf.net
----- Original Message -----
From: "Sithu D. Sudarsan" <Si...@fda.hhs.gov>
To: java-user@lucene.apache.org
Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific
Subject: Parsing large xml files
Hi,
While trying to parse xml documents of about 50MB size, we run into
OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB
(that is the max), does not help. Is there any API that could be used to
handle such large single xml files?
If Lucene is not the right place, please let me know alternate places to
look for,
Thanks in advance,
Sithu D Sudarsan
sithu.sudarsan@fda.hhs.gov
sdsudarsan@ualr.edu