You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Shiva Kumar <sh...@pawaa.com> on 2015/07/27 09:38:44 UTC

Efficient way to read shared strings.

Hi, I am using XLSX2CSV class for converting XLSX to HTML with some
modifications. The method where I have a question is below.

 

public void process()

            throws IOException, OpenXML4JException,
ParserConfigurationException, SAXException, XMLStreamException

    {

        ReadOnlySharedStringsTable strings = new
ReadOnlySharedStringsTable(this.xlsxPackage); //-> Any memory efficient
solution.

        XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);

        StylesTable styles = xssfReader.getStylesTable();

        XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator)
xssfReader.getSheetsData();

        

        htmlWriter.writeStartElement("html");
// this is my code which is writing body of html.

 
htmlWriter.writeCharacters(System.lineSeparator());

                                htmlWriter.writeStartElement("head");

 
htmlWriter.writeCharacters(System.lineSeparator());               

                                htmlWriter.writeStartElement("title");

                                htmlWriter.writeCharacters(name);

                                htmlWriter.writeEndElement();

                                htmlWriter.writeEndElement();

                                htmlWriter.writeStartElement("body");
// this is my code which is writing body of html.

        

        int index = 0;

        while (iter.hasNext()) {

            InputStream stream = iter.next();

            String sheetName = iter.getSheetName();

            processSheet(styles, strings, stream, sheetName, index);   //
processing each sheet custom implementation.

            stream.close();

            ++index;

        }

    }

 

PROBLEM DISCRIPTION:

                The method process() uses ReadOnlySharedStringsTable class
to construct and store all strings in list by reading "sharedStrings.xml",
but it causes memory issues for very large files with large strings, Is
there any other memory efficient way provided by POI.

 

Thank You


Re: Efficient way to read shared strings.

Posted by Dominik Stadler <do...@gmx.at>.
Sorry, my previous post was for another thread, please disregard this response.

You probably need to come up with a different implementation of the
SharedStringsTable which does not keep all the data in memory, however
the best implementation likely depends largely on the type of
input-data that you process with it, it will be hard to provide an
efficient disk-based implementation for the general case ...

Dominik.

On Mon, Jul 27, 2015 at 5:10 PM, Dominik Stadler <do...@gmx.at> wrote:
> Hi,
>
> Sounds like a useful tool. I don't know of such a project already
> existing, but it should be quite straightforward with the POI
> interfaces.
>
> A related project is https://github.com/centic9/poi-mail-merge which
> does a similar thing for providing mail-merge for word-documents,
> although it needs to use a bit lower-level interfaces to do the
> replacements.
>
> For Excel files you should be able to use the normal high-level
> interfaces to look through the contents of the template,
> replace/insert all necessary things and write the result to a new
> document.
>
> Dominik.
>
> On Mon, Jul 27, 2015 at 9:38 AM, Shiva Kumar <sh...@pawaa.com> wrote:
>> Hi, I am using XLSX2CSV class for converting XLSX to HTML with some
>> modifications. The method where I have a question is below.
>>
>>
>>
>> public void process()
>>
>>             throws IOException, OpenXML4JException,
>> ParserConfigurationException, SAXException, XMLStreamException
>>
>>     {
>>
>>         ReadOnlySharedStringsTable strings = new
>> ReadOnlySharedStringsTable(this.xlsxPackage); //-> Any memory efficient
>> solution.
>>
>>         XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
>>
>>         StylesTable styles = xssfReader.getStylesTable();
>>
>>         XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator)
>> xssfReader.getSheetsData();
>>
>>
>>
>>         htmlWriter.writeStartElement("html");
>> // this is my code which is writing body of html.
>>
>>
>> htmlWriter.writeCharacters(System.lineSeparator());
>>
>>                                 htmlWriter.writeStartElement("head");
>>
>>
>> htmlWriter.writeCharacters(System.lineSeparator());
>>
>>                                 htmlWriter.writeStartElement("title");
>>
>>                                 htmlWriter.writeCharacters(name);
>>
>>                                 htmlWriter.writeEndElement();
>>
>>                                 htmlWriter.writeEndElement();
>>
>>                                 htmlWriter.writeStartElement("body");
>> // this is my code which is writing body of html.
>>
>>
>>
>>         int index = 0;
>>
>>         while (iter.hasNext()) {
>>
>>             InputStream stream = iter.next();
>>
>>             String sheetName = iter.getSheetName();
>>
>>             processSheet(styles, strings, stream, sheetName, index);   //
>> processing each sheet custom implementation.
>>
>>             stream.close();
>>
>>             ++index;
>>
>>         }
>>
>>     }
>>
>>
>>
>> PROBLEM DISCRIPTION:
>>
>>                 The method process() uses ReadOnlySharedStringsTable class
>> to construct and store all strings in list by reading "sharedStrings.xml",
>> but it causes memory issues for very large files with large strings, Is
>> there any other memory efficient way provided by POI.
>>
>>
>>
>> Thank You
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: Efficient way to read shared strings.

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

Sounds like a useful tool. I don't know of such a project already
existing, but it should be quite straightforward with the POI
interfaces.

A related project is https://github.com/centic9/poi-mail-merge which
does a similar thing for providing mail-merge for word-documents,
although it needs to use a bit lower-level interfaces to do the
replacements.

For Excel files you should be able to use the normal high-level
interfaces to look through the contents of the template,
replace/insert all necessary things and write the result to a new
document.

Dominik.

On Mon, Jul 27, 2015 at 9:38 AM, Shiva Kumar <sh...@pawaa.com> wrote:
> Hi, I am using XLSX2CSV class for converting XLSX to HTML with some
> modifications. The method where I have a question is below.
>
>
>
> public void process()
>
>             throws IOException, OpenXML4JException,
> ParserConfigurationException, SAXException, XMLStreamException
>
>     {
>
>         ReadOnlySharedStringsTable strings = new
> ReadOnlySharedStringsTable(this.xlsxPackage); //-> Any memory efficient
> solution.
>
>         XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
>
>         StylesTable styles = xssfReader.getStylesTable();
>
>         XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator)
> xssfReader.getSheetsData();
>
>
>
>         htmlWriter.writeStartElement("html");
> // this is my code which is writing body of html.
>
>
> htmlWriter.writeCharacters(System.lineSeparator());
>
>                                 htmlWriter.writeStartElement("head");
>
>
> htmlWriter.writeCharacters(System.lineSeparator());
>
>                                 htmlWriter.writeStartElement("title");
>
>                                 htmlWriter.writeCharacters(name);
>
>                                 htmlWriter.writeEndElement();
>
>                                 htmlWriter.writeEndElement();
>
>                                 htmlWriter.writeStartElement("body");
> // this is my code which is writing body of html.
>
>
>
>         int index = 0;
>
>         while (iter.hasNext()) {
>
>             InputStream stream = iter.next();
>
>             String sheetName = iter.getSheetName();
>
>             processSheet(styles, strings, stream, sheetName, index);   //
> processing each sheet custom implementation.
>
>             stream.close();
>
>             ++index;
>
>         }
>
>     }
>
>
>
> PROBLEM DISCRIPTION:
>
>                 The method process() uses ReadOnlySharedStringsTable class
> to construct and store all strings in list by reading "sharedStrings.xml",
> but it causes memory issues for very large files with large strings, Is
> there any other memory efficient way provided by POI.
>
>
>
> Thank You
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org