You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Renjith R <re...@gmail.com> on 2016/12/23 01:27:32 UTC

[Suggestion] Enhancement for reading big excel files

Hi Developers,

    Couple of years back I suggested an enhancement to read very large
excel files using StAX api. Attached the document. Unfortunately, I did not
get a chance to work on it. Do you think it will make sense if I start
working on it?. Kindly let me know your suggestions.

regards,
Renjith

Re: [Suggestion] Enhancement for reading big excel files

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

The attachment did not make it through, can you resend?

Dominik.

On Fri, Dec 23, 2016 at 2:27 AM, Renjith R <re...@gmail.com>
wrote:

> Hi Developers,
>
>     Couple of years back I suggested an enhancement to read very large
> excel files using StAX api. Attached the document. Unfortunately, I did not
> get a chance to work on it. Do you think it will make sense if I start
> working on it?. Kindly let me know your suggestions.
>
> regards,
> Renjith
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

sorry for the long delay, I have been busy with other things lately and
none of the other committers stepped in.

Let's see some first code then, I am not sure if adding more complexity to
SXSSFWorkbook is a good thing, maybe we can at least separate out most of
the functionality cleanly to not clobber the already large class with more
code.

Please show it whenever you have at least some code in place, it does not
need to be completed or anything, just a first proof-of-concept, so we can
iterate over it and ensure it matches into the overall code-structure from
the start.

Thanks... Dominik.

On Wed, Jan 4, 2017 at 4:35 AM, Renjith R <re...@gmail.com>
wrote:

> Thanks a lot for the comments, Dominik.
>
> To answer your questions..
>
> * How would you ensure feature-parity compared to HSSF/XSSF implementation?
> There are a large number of things that are possible in a workbook, do you
> plan to support all those or only a subset?
>
> Well.. I would like to start it with the XSSF implimentation , as I am not much familiar with the HSSF one.
>
> I am not looking to support a subset, coz no one is going to use it unless it supports some basic functionalities.
>
> * The text seems to indicate that there is already some code already
> available. Can we take a look? You can start a fork of Apache POI fromhttps://github.com/apache/poi/ easily and do the changes there so others
> can take a look and suggest improvements/changes. Or is it a standalone
> piece of code?
>
> No plans for a stand alone code as long as we can incorporate it with exising functionality. Since we already have a class (org.apache.poi.xssf.streaming.SXSSFWorkbook) that is dedicated to reduce memory consumption, I would like to start with it and see if this can be added as a feature to it. I will also take a look at the code to see if we can leverage any exisitng functionality.
>
> * How would you ensure that the code is maintained over time? As this
> sounds like quite a large chunk of code, are you planning to continue to
> invest some time in the long run? We had some cases where code was
> "donated", but never looked at afterwards, which is bad as it increases the
> code-base, but also increases number of bug-reports and areas that are not
> well covered by tests.
>
> :). I am not looking for a 'code donation' here. I'll be around for a long time.
>
>
> On Sun, Dec 25, 2016 at 4:19 PM, Renjith R <re...@gmail.com>
> wrote:
>
>> I don't know if you are able to see the screenshot in my previous mail.
>> Following was your comment.
>> I would start working on it if you think it worths adding.
>>
>> *From: *Dominik Stadler <d....@gmx.at>
>> *Subject: *Re: Suggestion on how to read huge excel files.
>> *Date: *2015-06-20 15:24 (+0530)
>> *List: *user@poi.apache.org
>> <ht...@poi.apache.org>
>>
>> It seems not that many people need similar functionality currently,
>> however it looks useful for handling very large documents.
>>
>> I looked at it and it looks good, some comments:
>>
>> * The finalize() in the Beans looks strange and should not be needed,
>> these members are freed anyway and having to implement finalize()
>> always looks fishy!
>>
>> Thanks... Dominik.
>>
>>
>>
>> On Sun, Dec 25, 2016 at 4:12 PM, Renjith R <re...@gmail.com>
>> wrote:
>>
>>> Ok. I recall that. It was you who did the code review that time.
>>>
>>>
>>> ​
>>>
>>> On Sun, Dec 25, 2016 at 4:04 PM, Renjith R <re...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Dominik. I'll try to resend it.
>>>> Let me know if you can see the attachments.
>>>>
>>>> On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <renjith.r.panikar@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Developers,
>>>>>
>>>>>     Couple of years back I suggested an enhancement to read very large
>>>>> excel files using StAX api. Attached the document. Unfortunately, I did not
>>>>> get a chance to work on it. Do you think it will make sense if I start
>>>>> working on it?. Kindly let me know your suggestions.
>>>>>
>>>>> regards,
>>>>> Renjith
>>>>>
>>>>
>>>>
>>>
>>
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Renjith R <re...@gmail.com>.
Thanks a lot for the comments, Dominik.

To answer your questions..

* How would you ensure feature-parity compared to HSSF/XSSF implementation?
There are a large number of things that are possible in a workbook, do you
plan to support all those or only a subset?

Well.. I would like to start it with the XSSF implimentation , as I am
not much familiar with the HSSF one.

I am not looking to support a subset, coz no one is going to use it
unless it supports some basic functionalities.

* The text seems to indicate that there is already some code already
available. Can we take a look? You can start a fork of Apache POI
fromhttps://github.com/apache/poi/ easily and do the changes there so
others
can take a look and suggest improvements/changes. Or is it a standalone
piece of code?

No plans for a stand alone code as long as we can incorporate it with
exising functionality. Since we already have a class
(org.apache.poi.xssf.streaming.SXSSFWorkbook) that is dedicated to
reduce memory consumption, I would like to start with it and see if
this can be added as a feature to it. I will also take a look at the
code to see if we can leverage any exisitng functionality.

* How would you ensure that the code is maintained over time? As this
sounds like quite a large chunk of code, are you planning to continue to
invest some time in the long run? We had some cases where code was
"donated", but never looked at afterwards, which is bad as it increases the
code-base, but also increases number of bug-reports and areas that are not
well covered by tests.

:). I am not looking for a 'code donation' here. I'll be around for a
long time.


On Sun, Dec 25, 2016 at 4:19 PM, Renjith R <re...@gmail.com>
wrote:

> I don't know if you are able to see the screenshot in my previous mail.
> Following was your comment.
> I would start working on it if you think it worths adding.
>
> *From: *Dominik Stadler <d....@gmx.at>
> *Subject: *Re: Suggestion on how to read huge excel files.
> *Date: *2015-06-20 15:24 (+0530)
> *List: *user@poi.apache.org
> <ht...@poi.apache.org>
>
> It seems not that many people need similar functionality currently,
> however it looks useful for handling very large documents.
>
> I looked at it and it looks good, some comments:
>
> * The finalize() in the Beans looks strange and should not be needed,
> these members are freed anyway and having to implement finalize()
> always looks fishy!
>
> Thanks... Dominik.
>
>
>
> On Sun, Dec 25, 2016 at 4:12 PM, Renjith R <re...@gmail.com>
> wrote:
>
>> Ok. I recall that. It was you who did the code review that time.
>>
>>
>> ​
>>
>> On Sun, Dec 25, 2016 at 4:04 PM, Renjith R <re...@gmail.com>
>> wrote:
>>
>>> Thanks, Dominik. I'll try to resend it.
>>> Let me know if you can see the attachments.
>>>
>>> On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <re...@gmail.com>
>>> wrote:
>>>
>>>> Hi Developers,
>>>>
>>>>     Couple of years back I suggested an enhancement to read very large
>>>> excel files using StAX api. Attached the document. Unfortunately, I did not
>>>> get a chance to work on it. Do you think it will make sense if I start
>>>> working on it?. Kindly let me know your suggestions.
>>>>
>>>> regards,
>>>> Renjith
>>>>
>>>
>>>
>>
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

thanks for the detailed writeup, the functionality looks like it would be
useful in many cases where memory is limited and files are large, we
encounter these on a regular basis on the mailing lists and stackoverflow.

A few questions upfront:

* How would you ensure feature-parity compared to HSSF/XSSF implementation?
There are a large number of things that are possible in a workbook, do you
plan to support all those or only a subset?
* The text seems to indicate that there is already some code already
available. Can we take a look? You can start a fork of Apache POI from
https://github.com/apache/poi/ easily and do the changes there so others
can take a look and suggest improvements/changes. Or is it a standalone
piece of code?
* How would you ensure that the code is maintained over time? As this
sounds like quite a large chunk of code, are you planning to continue to
invest some time in the long run? We had some cases where code was
"donated", but never looked at afterwards, which is bad as it increases the
code-base, but also increases number of bug-reports and areas that are not
well covered by tests.

Thanks... Dominik.

On Sun, Dec 25, 2016 at 11:49 AM, Renjith R <re...@gmail.com>
wrote:

> I don't know if you are able to see the screenshot in my previous mail.
> Following was your comment.
> I would start working on it if you think it worths adding.
>
> *From: *Dominik Stadler <d....@gmx.at>
> *Subject: *Re: Suggestion on how to read huge excel files.
> *Date: *2015-06-20 15:24 (+0530)
> *List: *user@poi.apache.org
> <ht...@poi.apache.org>
>
> It seems not that many people need similar functionality currently,
> however it looks useful for handling very large documents.
>
> I looked at it and it looks good, some comments:
>
> * The finalize() in the Beans looks strange and should not be needed,
> these members are freed anyway and having to implement finalize()
> always looks fishy!
>
> Thanks... Dominik.
>
>
>
> On Sun, Dec 25, 2016 at 4:12 PM, Renjith R <re...@gmail.com>
> wrote:
>
>> Ok. I recall that. It was you who did the code review that time.
>>
>>
>> ​
>>
>> On Sun, Dec 25, 2016 at 4:04 PM, Renjith R <re...@gmail.com>
>> wrote:
>>
>>> Thanks, Dominik. I'll try to resend it.
>>> Let me know if you can see the attachments.
>>>
>>> On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <re...@gmail.com>
>>> wrote:
>>>
>>>> Hi Developers,
>>>>
>>>>     Couple of years back I suggested an enhancement to read very large
>>>> excel files using StAX api. Attached the document. Unfortunately, I did not
>>>> get a chance to work on it. Do you think it will make sense if I start
>>>> working on it?. Kindly let me know your suggestions.
>>>>
>>>> regards,
>>>> Renjith
>>>>
>>>
>>>
>>
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Renjith R <re...@gmail.com>.
I don't know if you are able to see the screenshot in my previous mail.
Following was your comment.
I would start working on it if you think it worths adding.

*From: *Dominik Stadler <d....@gmx.at>
*Subject: *Re: Suggestion on how to read huge excel files.
*Date: *2015-06-20 15:24 (+0530)
*List: *user@poi.apache.org
<ht...@poi.apache.org>

It seems not that many people need similar functionality currently,
however it looks useful for handling very large documents.

I looked at it and it looks good, some comments:

* The finalize() in the Beans looks strange and should not be needed,
these members are freed anyway and having to implement finalize()
always looks fishy!

Thanks... Dominik.



On Sun, Dec 25, 2016 at 4:12 PM, Renjith R <re...@gmail.com>
wrote:

> Ok. I recall that. It was you who did the code review that time.
>
>
> ​
>
> On Sun, Dec 25, 2016 at 4:04 PM, Renjith R <re...@gmail.com>
> wrote:
>
>> Thanks, Dominik. I'll try to resend it.
>> Let me know if you can see the attachments.
>>
>> On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <re...@gmail.com>
>> wrote:
>>
>>> Hi Developers,
>>>
>>>     Couple of years back I suggested an enhancement to read very large
>>> excel files using StAX api. Attached the document. Unfortunately, I did not
>>> get a chance to work on it. Do you think it will make sense if I start
>>> working on it?. Kindly let me know your suggestions.
>>>
>>> regards,
>>> Renjith
>>>
>>
>>
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Renjith R <re...@gmail.com>.
Ok. I recall that. It was you who did the code review that time.


​

On Sun, Dec 25, 2016 at 4:04 PM, Renjith R <re...@gmail.com>
wrote:

> Thanks, Dominik. I'll try to resend it.
> Let me know if you can see the attachments.
>
> On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <re...@gmail.com>
> wrote:
>
>> Hi Developers,
>>
>>     Couple of years back I suggested an enhancement to read very large
>> excel files using StAX api. Attached the document. Unfortunately, I did not
>> get a chance to work on it. Do you think it will make sense if I start
>> working on it?. Kindly let me know your suggestions.
>>
>> regards,
>> Renjith
>>
>
>

Re: [Suggestion] Enhancement for reading big excel files

Posted by Renjith R <re...@gmail.com>.
Thanks, Dominik. I'll try to resend it.
Let me know if you can see the attachments.

On Fri, Dec 23, 2016 at 6:57 AM, Renjith R <re...@gmail.com>
wrote:

> Hi Developers,
>
>     Couple of years back I suggested an enhancement to read very large
> excel files using StAX api. Attached the document. Unfortunately, I did not
> get a chance to work on it. Do you think it will make sense if I start
> working on it?. Kindly let me know your suggestions.
>
> regards,
> Renjith
>