You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by Greg Woolsey <gr...@gmail.com> on 2018/02/23 06:53:08 UTC

baffling save behavior

I hesitate to even bring it up, thinking I should be able to figure it out,
but I've never seen behavior like this.  Keep reading if you want to twist
your brain in knots like mine.

TL/DR - stop now if you'd rather be productive for your day job.

Somehow, I have a consistent state where a file originating in Excel,
modified through POI, then saved and opened in Excel causes not just an
Excel repair, but incorrect/old data as well.

So far, just a coding error, I figure.  But the weirdness starts when I use
the debugger to check on things.  A specific row that I know comes out
wrong has the right cell values when I use
wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
and CTCell objects, then I find the old values still.  So somehow, I've got
the XSSFRow._cells collection out of sync with the CTRow _row contents.

Anyone know how I could have done that?

I may be calling workbook.write() in multiple places for different
purposes/copies.  It looks to me like XSSFRow.onDocumentWrite() explicitly
replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
but it is replacing the CTCell references in the XSSFCell objects with the
new copies, .

I haven't found a smoking gun yet, but any thoughts are welcome.  Perhaps
the issue may lie with the formula evaluator created prior to the save
event?

It may be related to that bug, or even directly tied to it, as that was
about updating tables, and that's part of the updates my process is doing
as well.

Sorry for the long document, normally I just figure these out on my own,
but I've been at this a week so far, which has never happened before.  If
it comes down to black box behavior by XMLBeans, that would help explain
it, as I know next to nothing about that library's details.

Greg

Re: baffling save behavior

Posted by Dominik Stadler <do...@gmx.at>.
Hi Greg,

Nice catch and fix!

Dominik.

On Sat, Feb 24, 2018 at 10:06 PM, Greg Woolsey <gr...@gmail.com>
wrote:

> I did it - I created a reproducible test case that involves pure POI API
> calls, no starting workbook produced elsewhere.  I'll file a new bug to
> reference when I commit.
>
> On Fri, Feb 23, 2018 at 10:13 PM Greg Woolsey <gr...@gmail.com>
> wrote:
>
> > Yes, I thought of a possible approach to a test case tonight while doing
> > the kids' activities transportation. If I have time this weekend I'll
> give
> > it a stab.
> >
> > On Fri, Feb 23, 2018, 21:48 Dominik Stadler <do...@gmx.at>
> > wrote:
> >
> >> Hi,
> >>
> >> Your analysis sounds sensible and changing onDocumentWrite() looks ok to
> >> me
> >> as well from a quick look at the code, we would only add an allocation
> of
> >> the array, which is quit performance-wise compared to all the stuff that
> >> happens while saving the document.
> >>
> >> A reproducing test-case would be nice, so we can verify this going
> >> forward.
> >>
> >> Dominik.
> >>
> >>
> >> On Fri, Feb 23, 2018 at 10:34 PM, Greg Woolsey <gr...@gmail.com>
> >> wrote:
> >>
> >> > I'm finally able to work around the problem by changing
> >> > XSSFRow.onDocumentWrite() to always copy the current state of the
> _cells
> >> > collection to the _row CArray, instead of only when things don't look
> >> like
> >> > they are ordered.  There are several cases I've uncovered where the
> two
> >> > groups can be the same size, with the same cell reference strings in
> the
> >> > same order, but still not equal.  One involves adding cells then
> saving
> >> > then editing, another involves possible side effects of
> >> > XSSFCell.setBlank().
> >> >
> >> > The safest option I can see is to just simply the method to just the
> >> code
> >> > inside the if(!isOrdered) statement, without the precondition
> >> checking.  I
> >> > copied that code to my test project and applied it after every data
> >> table
> >> > row update, and that fixed my issue without appearing to cause a
> >> > performance hit.  In fact, removing the ordering checking probably
> >> makes it
> >> > a constant time change, since it was doing essentially the same loops
> >> > whether it applied any changes or not.
> >> >
> >> > Anyone want to weigh in on that simplification of
> >> > XSSFRow.onDocumentWrite()?
> >> >
> >> >
> >> > On Fri, Feb 23, 2018 at 12:55 PM Greg Woolsey <greg.woolsey@gmail.com
> >
> >> > wrote:
> >> >
> >> > > For what it's worth, I'm using a build from last August.  The code
> >> > changes
> >> > > I'm seeing since then are almost entirely about API changes to use
> >> Enums
> >> > > and ints and remove deprecated methods.
> >> > >
> >> > > I'm suspicious there are not a lot of use cases where a document is
> >> > > opened, edited, saved, edited again without re-opening, then saved
> >> again.
> >> > > That's where I'm seeing the sync issues, after the first save.  Then
> >> some
> >> > > subsequent edits get lost.  it appears to be for cells that did not
> >> exist
> >> > > originally, then were added by edits.  After the first save, edits
> to
> >> > these
> >> > > cells are not passed on to the CTRow, causing the second save to
> >> contain
> >> > > incorrect data.  I'm still digging.
> >> > >
> >> > > On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <
> >> > ABearez@for-scala.it>
> >> > > wrote:
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >> It might have been the same symptoms which have led Sandeep Tiwari
> to
> >> > >> make a code change that I was reluctant to accept in the branch
> about
> >> > >> charts in XWPF. His explanations were in the line of yours. But I
> >> > couldn't
> >> > >> not accept the idea that such a fundamental feature had been
> broken.
> >> > >>
> >> > >> I hope to have some time this weekend to dive into the commits
> >> history
> >> > to
> >> > >> point out any change done recently that might affect that.
> >> > >>
> >> > >> Best regards,
> >> > >> Alain FAGOT BÉAREZ
> >> > >>
> >> > >> ⁣
> >> > >>
> >> > >> Von: Greg Woolsey
> >> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> >> > >> An: POI Developers List
> >> > >> Betreff: baffling save behavior
> >> > >>
> >> > >> I hesitate to even bring it up, thinking I should be able to figure
> >> it
> >> > >> out,
> >> > >> but I've never seen behavior like this. Keep reading if you want to
> >> > twist
> >> > >> your brain in knots like mine.
> >> > >>
> >> > >> TL/DR - stop now if you'd rather be productive for your day job.
> >> > >>
> >> > >> Somehow, I have a consistent state where a file originating in
> Excel,
> >> > >> modified through POI, then saved and opened in Excel causes not
> just
> >> an
> >> > >> Excel repair, but incorrect/old data as well.
> >> > >>
> >> > >> So far, just a coding error, I figure. But the weirdness starts
> when
> >> I
> >> > use
> >> > >> the debugger to check on things. A specific row that I know comes
> out
> >> > >> wrong has the right cell values when I use
> >> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check
> the
> >> > CTRow
> >> > >> and CTCell objects, then I find the old values still. So somehow,
> >> I've
> >> > got
> >> > >> the XSSFRow._cells collection out of sync with the CTRow _row
> >> contents.
> >> > >>
> >> > >> Anyone know how I could have done that?
> >> > >>
> >> > >> I may be calling workbook.write() in multiple places for different
> >> > >> purposes/copies. It looks to me like XSSFRow.onDocumentWrite()
> >> > explicitly
> >> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
> >> > #56170,
> >> > >> but it is replacing the CTCell references in the XSSFCell objects
> >> with
> >> > the
> >> > >> new copies, .
> >> > >>
> >> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
> >> Perhaps
> >> > >> the issue may lie with the formula evaluator created prior to the
> >> save
> >> > >> event?
> >> > >>
> >> > >> It may be related to that bug, or even directly tied to it, as that
> >> was
> >> > >> about updating tables, and that's part of the updates my process is
> >> > doing
> >> > >> as well.
> >> > >>
> >> > >> Sorry for the long document, normally I just figure these out on my
> >> own,
> >> > >> but I've been at this a week so far, which has never happened
> >> before. If
> >> > >> it comes down to black box behavior by XMLBeans, that would help
> >> explain
> >> > >> it, as I know next to nothing about that library's details.
> >> > >>
> >> > >> Greg
> >> > >>
> >> > >>
> >> > >> -------- Originale Nachricht --------
> >> > >> Von: Greg Woolsey <gr...@gmail.com>
> >> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> >> > >> An: POI Developers List <de...@poi.apache.org>
> >> > >> Betreff: baffling save behavior
> >> > >>
> >> > >> I hesitate to even bring it up, thinking I should be able to figure
> >> it
> >> > >> out,
> >> > >> but I've never seen behavior like this.  Keep reading if you want
> to
> >> > twist
> >> > >> your brain in knots like mine.
> >> > >>
> >> > >> TL/DR - stop now if you'd rather be productive for your day job.
> >> > >>
> >> > >> Somehow, I have a consistent state where a file originating in
> Excel,
> >> > >> modified through POI, then saved and opened in Excel causes not
> just
> >> an
> >> > >> Excel repair, but incorrect/old data as well.
> >> > >>
> >> > >> So far, just a coding error, I figure.  But the weirdness starts
> >> when I
> >> > >> use
> >> > >> the debugger to check on things.  A specific row that I know comes
> >> out
> >> > >> wrong has the right cell values when I use
> >> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check
> the
> >> > CTRow
> >> > >> and CTCell objects, then I find the old values still.  So somehow,
> >> I've
> >> > >> got
> >> > >> the XSSFRow._cells collection out of sync with the CTRow _row
> >> contents.
> >> > >>
> >> > >> Anyone know how I could have done that?
> >> > >>
> >> > >> I may be calling workbook.write() in multiple places for different
> >> > >> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite()
> >> > explicitly
> >> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
> >> > #56170,
> >> > >> but it is replacing the CTCell references in the XSSFCell objects
> >> with
> >> > the
> >> > >> new copies, .
> >> > >>
> >> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
> >> > Perhaps
> >> > >> the issue may lie with the formula evaluator created prior to the
> >> save
> >> > >> event?
> >> > >>
> >> > >> It may be related to that bug, or even directly tied to it, as that
> >> was
> >> > >> about updating tables, and that's part of the updates my process is
> >> > doing
> >> > >> as well.
> >> > >>
> >> > >> Sorry for the long document, normally I just figure these out on my
> >> own,
> >> > >> but I've been at this a week so far, which has never happened
> before.
> >> > If
> >> > >> it comes down to black box behavior by XMLBeans, that would help
> >> explain
> >> > >> it, as I know next to nothing about that library's details.
> >> > >>
> >> > >> Greg
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: baffling save behavior

Posted by Greg Woolsey <gr...@gmail.com>.
I did it - I created a reproducible test case that involves pure POI API
calls, no starting workbook produced elsewhere.  I'll file a new bug to
reference when I commit.

On Fri, Feb 23, 2018 at 10:13 PM Greg Woolsey <gr...@gmail.com>
wrote:

> Yes, I thought of a possible approach to a test case tonight while doing
> the kids' activities transportation. If I have time this weekend I'll give
> it a stab.
>
> On Fri, Feb 23, 2018, 21:48 Dominik Stadler <do...@gmx.at>
> wrote:
>
>> Hi,
>>
>> Your analysis sounds sensible and changing onDocumentWrite() looks ok to
>> me
>> as well from a quick look at the code, we would only add an allocation of
>> the array, which is quit performance-wise compared to all the stuff that
>> happens while saving the document.
>>
>> A reproducing test-case would be nice, so we can verify this going
>> forward.
>>
>> Dominik.
>>
>>
>> On Fri, Feb 23, 2018 at 10:34 PM, Greg Woolsey <gr...@gmail.com>
>> wrote:
>>
>> > I'm finally able to work around the problem by changing
>> > XSSFRow.onDocumentWrite() to always copy the current state of the _cells
>> > collection to the _row CArray, instead of only when things don't look
>> like
>> > they are ordered.  There are several cases I've uncovered where the two
>> > groups can be the same size, with the same cell reference strings in the
>> > same order, but still not equal.  One involves adding cells then saving
>> > then editing, another involves possible side effects of
>> > XSSFCell.setBlank().
>> >
>> > The safest option I can see is to just simply the method to just the
>> code
>> > inside the if(!isOrdered) statement, without the precondition
>> checking.  I
>> > copied that code to my test project and applied it after every data
>> table
>> > row update, and that fixed my issue without appearing to cause a
>> > performance hit.  In fact, removing the ordering checking probably
>> makes it
>> > a constant time change, since it was doing essentially the same loops
>> > whether it applied any changes or not.
>> >
>> > Anyone want to weigh in on that simplification of
>> > XSSFRow.onDocumentWrite()?
>> >
>> >
>> > On Fri, Feb 23, 2018 at 12:55 PM Greg Woolsey <gr...@gmail.com>
>> > wrote:
>> >
>> > > For what it's worth, I'm using a build from last August.  The code
>> > changes
>> > > I'm seeing since then are almost entirely about API changes to use
>> Enums
>> > > and ints and remove deprecated methods.
>> > >
>> > > I'm suspicious there are not a lot of use cases where a document is
>> > > opened, edited, saved, edited again without re-opening, then saved
>> again.
>> > > That's where I'm seeing the sync issues, after the first save.  Then
>> some
>> > > subsequent edits get lost.  it appears to be for cells that did not
>> exist
>> > > originally, then were added by edits.  After the first save, edits to
>> > these
>> > > cells are not passed on to the CTRow, causing the second save to
>> contain
>> > > incorrect data.  I'm still digging.
>> > >
>> > > On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <
>> > ABearez@for-scala.it>
>> > > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> It might have been the same symptoms which have led Sandeep Tiwari to
>> > >> make a code change that I was reluctant to accept in the branch about
>> > >> charts in XWPF. His explanations were in the line of yours. But I
>> > couldn't
>> > >> not accept the idea that such a fundamental feature had been broken.
>> > >>
>> > >> I hope to have some time this weekend to dive into the commits
>> history
>> > to
>> > >> point out any change done recently that might affect that.
>> > >>
>> > >> Best regards,
>> > >> Alain FAGOT BÉAREZ
>> > >>
>> > >> ⁣
>> > >>
>> > >> Von: Greg Woolsey
>> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
>> > >> An: POI Developers List
>> > >> Betreff: baffling save behavior
>> > >>
>> > >> I hesitate to even bring it up, thinking I should be able to figure
>> it
>> > >> out,
>> > >> but I've never seen behavior like this. Keep reading if you want to
>> > twist
>> > >> your brain in knots like mine.
>> > >>
>> > >> TL/DR - stop now if you'd rather be productive for your day job.
>> > >>
>> > >> Somehow, I have a consistent state where a file originating in Excel,
>> > >> modified through POI, then saved and opened in Excel causes not just
>> an
>> > >> Excel repair, but incorrect/old data as well.
>> > >>
>> > >> So far, just a coding error, I figure. But the weirdness starts when
>> I
>> > use
>> > >> the debugger to check on things. A specific row that I know comes out
>> > >> wrong has the right cell values when I use
>> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
>> > CTRow
>> > >> and CTCell objects, then I find the old values still. So somehow,
>> I've
>> > got
>> > >> the XSSFRow._cells collection out of sync with the CTRow _row
>> contents.
>> > >>
>> > >> Anyone know how I could have done that?
>> > >>
>> > >> I may be calling workbook.write() in multiple places for different
>> > >> purposes/copies. It looks to me like XSSFRow.onDocumentWrite()
>> > explicitly
>> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
>> > #56170,
>> > >> but it is replacing the CTCell references in the XSSFCell objects
>> with
>> > the
>> > >> new copies, .
>> > >>
>> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
>> Perhaps
>> > >> the issue may lie with the formula evaluator created prior to the
>> save
>> > >> event?
>> > >>
>> > >> It may be related to that bug, or even directly tied to it, as that
>> was
>> > >> about updating tables, and that's part of the updates my process is
>> > doing
>> > >> as well.
>> > >>
>> > >> Sorry for the long document, normally I just figure these out on my
>> own,
>> > >> but I've been at this a week so far, which has never happened
>> before. If
>> > >> it comes down to black box behavior by XMLBeans, that would help
>> explain
>> > >> it, as I know next to nothing about that library's details.
>> > >>
>> > >> Greg
>> > >>
>> > >>
>> > >> -------- Originale Nachricht --------
>> > >> Von: Greg Woolsey <gr...@gmail.com>
>> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
>> > >> An: POI Developers List <de...@poi.apache.org>
>> > >> Betreff: baffling save behavior
>> > >>
>> > >> I hesitate to even bring it up, thinking I should be able to figure
>> it
>> > >> out,
>> > >> but I've never seen behavior like this.  Keep reading if you want to
>> > twist
>> > >> your brain in knots like mine.
>> > >>
>> > >> TL/DR - stop now if you'd rather be productive for your day job.
>> > >>
>> > >> Somehow, I have a consistent state where a file originating in Excel,
>> > >> modified through POI, then saved and opened in Excel causes not just
>> an
>> > >> Excel repair, but incorrect/old data as well.
>> > >>
>> > >> So far, just a coding error, I figure.  But the weirdness starts
>> when I
>> > >> use
>> > >> the debugger to check on things.  A specific row that I know comes
>> out
>> > >> wrong has the right cell values when I use
>> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
>> > CTRow
>> > >> and CTCell objects, then I find the old values still.  So somehow,
>> I've
>> > >> got
>> > >> the XSSFRow._cells collection out of sync with the CTRow _row
>> contents.
>> > >>
>> > >> Anyone know how I could have done that?
>> > >>
>> > >> I may be calling workbook.write() in multiple places for different
>> > >> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite()
>> > explicitly
>> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
>> > #56170,
>> > >> but it is replacing the CTCell references in the XSSFCell objects
>> with
>> > the
>> > >> new copies, .
>> > >>
>> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
>> > Perhaps
>> > >> the issue may lie with the formula evaluator created prior to the
>> save
>> > >> event?
>> > >>
>> > >> It may be related to that bug, or even directly tied to it, as that
>> was
>> > >> about updating tables, and that's part of the updates my process is
>> > doing
>> > >> as well.
>> > >>
>> > >> Sorry for the long document, normally I just figure these out on my
>> own,
>> > >> but I've been at this a week so far, which has never happened before.
>> > If
>> > >> it comes down to black box behavior by XMLBeans, that would help
>> explain
>> > >> it, as I know next to nothing about that library's details.
>> > >>
>> > >> Greg
>> > >>
>> > >
>> >
>>
>

Re: baffling save behavior

Posted by Greg Woolsey <gr...@gmail.com>.
Yes, I thought of a possible approach to a test case tonight while doing
the kids' activities transportation. If I have time this weekend I'll give
it a stab.

On Fri, Feb 23, 2018, 21:48 Dominik Stadler <do...@gmx.at> wrote:

> Hi,
>
> Your analysis sounds sensible and changing onDocumentWrite() looks ok to me
> as well from a quick look at the code, we would only add an allocation of
> the array, which is quit performance-wise compared to all the stuff that
> happens while saving the document.
>
> A reproducing test-case would be nice, so we can verify this going forward.
>
> Dominik.
>
>
> On Fri, Feb 23, 2018 at 10:34 PM, Greg Woolsey <gr...@gmail.com>
> wrote:
>
> > I'm finally able to work around the problem by changing
> > XSSFRow.onDocumentWrite() to always copy the current state of the _cells
> > collection to the _row CArray, instead of only when things don't look
> like
> > they are ordered.  There are several cases I've uncovered where the two
> > groups can be the same size, with the same cell reference strings in the
> > same order, but still not equal.  One involves adding cells then saving
> > then editing, another involves possible side effects of
> > XSSFCell.setBlank().
> >
> > The safest option I can see is to just simply the method to just the code
> > inside the if(!isOrdered) statement, without the precondition checking.
> I
> > copied that code to my test project and applied it after every data table
> > row update, and that fixed my issue without appearing to cause a
> > performance hit.  In fact, removing the ordering checking probably makes
> it
> > a constant time change, since it was doing essentially the same loops
> > whether it applied any changes or not.
> >
> > Anyone want to weigh in on that simplification of
> > XSSFRow.onDocumentWrite()?
> >
> >
> > On Fri, Feb 23, 2018 at 12:55 PM Greg Woolsey <gr...@gmail.com>
> > wrote:
> >
> > > For what it's worth, I'm using a build from last August.  The code
> > changes
> > > I'm seeing since then are almost entirely about API changes to use
> Enums
> > > and ints and remove deprecated methods.
> > >
> > > I'm suspicious there are not a lot of use cases where a document is
> > > opened, edited, saved, edited again without re-opening, then saved
> again.
> > > That's where I'm seeing the sync issues, after the first save.  Then
> some
> > > subsequent edits get lost.  it appears to be for cells that did not
> exist
> > > originally, then were added by edits.  After the first save, edits to
> > these
> > > cells are not passed on to the CTRow, causing the second save to
> contain
> > > incorrect data.  I'm still digging.
> > >
> > > On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <
> > ABearez@for-scala.it>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> It might have been the same symptoms which have led Sandeep Tiwari to
> > >> make a code change that I was reluctant to accept in the branch about
> > >> charts in XWPF. His explanations were in the line of yours. But I
> > couldn't
> > >> not accept the idea that such a fundamental feature had been broken.
> > >>
> > >> I hope to have some time this weekend to dive into the commits history
> > to
> > >> point out any change done recently that might affect that.
> > >>
> > >> Best regards,
> > >> Alain FAGOT BÉAREZ
> > >>
> > >> ⁣
> > >>
> > >> Von: Greg Woolsey
> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> > >> An: POI Developers List
> > >> Betreff: baffling save behavior
> > >>
> > >> I hesitate to even bring it up, thinking I should be able to figure it
> > >> out,
> > >> but I've never seen behavior like this. Keep reading if you want to
> > twist
> > >> your brain in knots like mine.
> > >>
> > >> TL/DR - stop now if you'd rather be productive for your day job.
> > >>
> > >> Somehow, I have a consistent state where a file originating in Excel,
> > >> modified through POI, then saved and opened in Excel causes not just
> an
> > >> Excel repair, but incorrect/old data as well.
> > >>
> > >> So far, just a coding error, I figure. But the weirdness starts when I
> > use
> > >> the debugger to check on things. A specific row that I know comes out
> > >> wrong has the right cell values when I use
> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
> > CTRow
> > >> and CTCell objects, then I find the old values still. So somehow, I've
> > got
> > >> the XSSFRow._cells collection out of sync with the CTRow _row
> contents.
> > >>
> > >> Anyone know how I could have done that?
> > >>
> > >> I may be calling workbook.write() in multiple places for different
> > >> purposes/copies. It looks to me like XSSFRow.onDocumentWrite()
> > explicitly
> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
> > #56170,
> > >> but it is replacing the CTCell references in the XSSFCell objects with
> > the
> > >> new copies, .
> > >>
> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
> Perhaps
> > >> the issue may lie with the formula evaluator created prior to the save
> > >> event?
> > >>
> > >> It may be related to that bug, or even directly tied to it, as that
> was
> > >> about updating tables, and that's part of the updates my process is
> > doing
> > >> as well.
> > >>
> > >> Sorry for the long document, normally I just figure these out on my
> own,
> > >> but I've been at this a week so far, which has never happened before.
> If
> > >> it comes down to black box behavior by XMLBeans, that would help
> explain
> > >> it, as I know next to nothing about that library's details.
> > >>
> > >> Greg
> > >>
> > >>
> > >> -------- Originale Nachricht --------
> > >> Von: Greg Woolsey <gr...@gmail.com>
> > >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> > >> An: POI Developers List <de...@poi.apache.org>
> > >> Betreff: baffling save behavior
> > >>
> > >> I hesitate to even bring it up, thinking I should be able to figure it
> > >> out,
> > >> but I've never seen behavior like this.  Keep reading if you want to
> > twist
> > >> your brain in knots like mine.
> > >>
> > >> TL/DR - stop now if you'd rather be productive for your day job.
> > >>
> > >> Somehow, I have a consistent state where a file originating in Excel,
> > >> modified through POI, then saved and opened in Excel causes not just
> an
> > >> Excel repair, but incorrect/old data as well.
> > >>
> > >> So far, just a coding error, I figure.  But the weirdness starts when
> I
> > >> use
> > >> the debugger to check on things.  A specific row that I know comes out
> > >> wrong has the right cell values when I use
> > >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
> > CTRow
> > >> and CTCell objects, then I find the old values still.  So somehow,
> I've
> > >> got
> > >> the XSSFRow._cells collection out of sync with the CTRow _row
> contents.
> > >>
> > >> Anyone know how I could have done that?
> > >>
> > >> I may be calling workbook.write() in multiple places for different
> > >> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite()
> > explicitly
> > >> replaces the CTRow array of CTCell beans with new ones due to bug,
> > #56170,
> > >> but it is replacing the CTCell references in the XSSFCell objects with
> > the
> > >> new copies, .
> > >>
> > >> I haven't found a smoking gun yet, but any thoughts are welcome.
> > Perhaps
> > >> the issue may lie with the formula evaluator created prior to the save
> > >> event?
> > >>
> > >> It may be related to that bug, or even directly tied to it, as that
> was
> > >> about updating tables, and that's part of the updates my process is
> > doing
> > >> as well.
> > >>
> > >> Sorry for the long document, normally I just figure these out on my
> own,
> > >> but I've been at this a week so far, which has never happened before.
> > If
> > >> it comes down to black box behavior by XMLBeans, that would help
> explain
> > >> it, as I know next to nothing about that library's details.
> > >>
> > >> Greg
> > >>
> > >
> >
>

Re: baffling save behavior

Posted by Dominik Stadler <do...@gmx.at>.
Hi,

Your analysis sounds sensible and changing onDocumentWrite() looks ok to me
as well from a quick look at the code, we would only add an allocation of
the array, which is quit performance-wise compared to all the stuff that
happens while saving the document.

A reproducing test-case would be nice, so we can verify this going forward.

Dominik.


On Fri, Feb 23, 2018 at 10:34 PM, Greg Woolsey <gr...@gmail.com>
wrote:

> I'm finally able to work around the problem by changing
> XSSFRow.onDocumentWrite() to always copy the current state of the _cells
> collection to the _row CArray, instead of only when things don't look like
> they are ordered.  There are several cases I've uncovered where the two
> groups can be the same size, with the same cell reference strings in the
> same order, but still not equal.  One involves adding cells then saving
> then editing, another involves possible side effects of
> XSSFCell.setBlank().
>
> The safest option I can see is to just simply the method to just the code
> inside the if(!isOrdered) statement, without the precondition checking.  I
> copied that code to my test project and applied it after every data table
> row update, and that fixed my issue without appearing to cause a
> performance hit.  In fact, removing the ordering checking probably makes it
> a constant time change, since it was doing essentially the same loops
> whether it applied any changes or not.
>
> Anyone want to weigh in on that simplification of
> XSSFRow.onDocumentWrite()?
>
>
> On Fri, Feb 23, 2018 at 12:55 PM Greg Woolsey <gr...@gmail.com>
> wrote:
>
> > For what it's worth, I'm using a build from last August.  The code
> changes
> > I'm seeing since then are almost entirely about API changes to use Enums
> > and ints and remove deprecated methods.
> >
> > I'm suspicious there are not a lot of use cases where a document is
> > opened, edited, saved, edited again without re-opening, then saved again.
> > That's where I'm seeing the sync issues, after the first save.  Then some
> > subsequent edits get lost.  it appears to be for cells that did not exist
> > originally, then were added by edits.  After the first save, edits to
> these
> > cells are not passed on to the CTRow, causing the second save to contain
> > incorrect data.  I'm still digging.
> >
> > On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <
> ABearez@for-scala.it>
> > wrote:
> >
> >> Hi,
> >>
> >> It might have been the same symptoms which have led Sandeep Tiwari to
> >> make a code change that I was reluctant to accept in the branch about
> >> charts in XWPF. His explanations were in the line of yours. But I
> couldn't
> >> not accept the idea that such a fundamental feature had been broken.
> >>
> >> I hope to have some time this weekend to dive into the commits history
> to
> >> point out any change done recently that might affect that.
> >>
> >> Best regards,
> >> Alain FAGOT BÉAREZ
> >>
> >> ⁣
> >>
> >> Von: Greg Woolsey
> >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> >> An: POI Developers List
> >> Betreff: baffling save behavior
> >>
> >> I hesitate to even bring it up, thinking I should be able to figure it
> >> out,
> >> but I've never seen behavior like this. Keep reading if you want to
> twist
> >> your brain in knots like mine.
> >>
> >> TL/DR - stop now if you'd rather be productive for your day job.
> >>
> >> Somehow, I have a consistent state where a file originating in Excel,
> >> modified through POI, then saved and opened in Excel causes not just an
> >> Excel repair, but incorrect/old data as well.
> >>
> >> So far, just a coding error, I figure. But the weirdness starts when I
> use
> >> the debugger to check on things. A specific row that I know comes out
> >> wrong has the right cell values when I use
> >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
> CTRow
> >> and CTCell objects, then I find the old values still. So somehow, I've
> got
> >> the XSSFRow._cells collection out of sync with the CTRow _row contents.
> >>
> >> Anyone know how I could have done that?
> >>
> >> I may be calling workbook.write() in multiple places for different
> >> purposes/copies. It looks to me like XSSFRow.onDocumentWrite()
> explicitly
> >> replaces the CTRow array of CTCell beans with new ones due to bug,
> #56170,
> >> but it is replacing the CTCell references in the XSSFCell objects with
> the
> >> new copies, .
> >>
> >> I haven't found a smoking gun yet, but any thoughts are welcome. Perhaps
> >> the issue may lie with the formula evaluator created prior to the save
> >> event?
> >>
> >> It may be related to that bug, or even directly tied to it, as that was
> >> about updating tables, and that's part of the updates my process is
> doing
> >> as well.
> >>
> >> Sorry for the long document, normally I just figure these out on my own,
> >> but I've been at this a week so far, which has never happened before. If
> >> it comes down to black box behavior by XMLBeans, that would help explain
> >> it, as I know next to nothing about that library's details.
> >>
> >> Greg
> >>
> >>
> >> -------- Originale Nachricht --------
> >> Von: Greg Woolsey <gr...@gmail.com>
> >> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> >> An: POI Developers List <de...@poi.apache.org>
> >> Betreff: baffling save behavior
> >>
> >> I hesitate to even bring it up, thinking I should be able to figure it
> >> out,
> >> but I've never seen behavior like this.  Keep reading if you want to
> twist
> >> your brain in knots like mine.
> >>
> >> TL/DR - stop now if you'd rather be productive for your day job.
> >>
> >> Somehow, I have a consistent state where a file originating in Excel,
> >> modified through POI, then saved and opened in Excel causes not just an
> >> Excel repair, but incorrect/old data as well.
> >>
> >> So far, just a coding error, I figure.  But the weirdness starts when I
> >> use
> >> the debugger to check on things.  A specific row that I know comes out
> >> wrong has the right cell values when I use
> >> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the
> CTRow
> >> and CTCell objects, then I find the old values still.  So somehow, I've
> >> got
> >> the XSSFRow._cells collection out of sync with the CTRow _row contents.
> >>
> >> Anyone know how I could have done that?
> >>
> >> I may be calling workbook.write() in multiple places for different
> >> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite()
> explicitly
> >> replaces the CTRow array of CTCell beans with new ones due to bug,
> #56170,
> >> but it is replacing the CTCell references in the XSSFCell objects with
> the
> >> new copies, .
> >>
> >> I haven't found a smoking gun yet, but any thoughts are welcome.
> Perhaps
> >> the issue may lie with the formula evaluator created prior to the save
> >> event?
> >>
> >> It may be related to that bug, or even directly tied to it, as that was
> >> about updating tables, and that's part of the updates my process is
> doing
> >> as well.
> >>
> >> Sorry for the long document, normally I just figure these out on my own,
> >> but I've been at this a week so far, which has never happened before.
> If
> >> it comes down to black box behavior by XMLBeans, that would help explain
> >> it, as I know next to nothing about that library's details.
> >>
> >> Greg
> >>
> >
>

Re: baffling save behavior

Posted by Greg Woolsey <gr...@gmail.com>.
I'm finally able to work around the problem by changing
XSSFRow.onDocumentWrite() to always copy the current state of the _cells
collection to the _row CArray, instead of only when things don't look like
they are ordered.  There are several cases I've uncovered where the two
groups can be the same size, with the same cell reference strings in the
same order, but still not equal.  One involves adding cells then saving
then editing, another involves possible side effects of XSSFCell.setBlank().

The safest option I can see is to just simply the method to just the code
inside the if(!isOrdered) statement, without the precondition checking.  I
copied that code to my test project and applied it after every data table
row update, and that fixed my issue without appearing to cause a
performance hit.  In fact, removing the ordering checking probably makes it
a constant time change, since it was doing essentially the same loops
whether it applied any changes or not.

Anyone want to weigh in on that simplification of XSSFRow.onDocumentWrite()?


On Fri, Feb 23, 2018 at 12:55 PM Greg Woolsey <gr...@gmail.com>
wrote:

> For what it's worth, I'm using a build from last August.  The code changes
> I'm seeing since then are almost entirely about API changes to use Enums
> and ints and remove deprecated methods.
>
> I'm suspicious there are not a lot of use cases where a document is
> opened, edited, saved, edited again without re-opening, then saved again.
> That's where I'm seeing the sync issues, after the first save.  Then some
> subsequent edits get lost.  it appears to be for cells that did not exist
> originally, then were added by edits.  After the first save, edits to these
> cells are not passed on to the CTRow, causing the second save to contain
> incorrect data.  I'm still digging.
>
> On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <AB...@for-scala.it>
> wrote:
>
>> Hi,
>>
>> It might have been the same symptoms which have led Sandeep Tiwari to
>> make a code change that I was reluctant to accept in the branch about
>> charts in XWPF. His explanations were in the line of yours. But I couldn't
>> not accept the idea that such a fundamental feature had been broken.
>>
>> I hope to have some time this weekend to dive into the commits history to
>> point out any change done recently that might affect that.
>>
>> Best regards,
>> Alain FAGOT BÉAREZ
>>
>> ⁣
>>
>> Von: Greg Woolsey
>> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
>> An: POI Developers List
>> Betreff: baffling save behavior
>>
>> I hesitate to even bring it up, thinking I should be able to figure it
>> out,
>> but I've never seen behavior like this. Keep reading if you want to twist
>> your brain in knots like mine.
>>
>> TL/DR - stop now if you'd rather be productive for your day job.
>>
>> Somehow, I have a consistent state where a file originating in Excel,
>> modified through POI, then saved and opened in Excel causes not just an
>> Excel repair, but incorrect/old data as well.
>>
>> So far, just a coding error, I figure. But the weirdness starts when I use
>> the debugger to check on things. A specific row that I know comes out
>> wrong has the right cell values when I use
>> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
>> and CTCell objects, then I find the old values still. So somehow, I've got
>> the XSSFRow._cells collection out of sync with the CTRow _row contents.
>>
>> Anyone know how I could have done that?
>>
>> I may be calling workbook.write() in multiple places for different
>> purposes/copies. It looks to me like XSSFRow.onDocumentWrite() explicitly
>> replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
>> but it is replacing the CTCell references in the XSSFCell objects with the
>> new copies, .
>>
>> I haven't found a smoking gun yet, but any thoughts are welcome. Perhaps
>> the issue may lie with the formula evaluator created prior to the save
>> event?
>>
>> It may be related to that bug, or even directly tied to it, as that was
>> about updating tables, and that's part of the updates my process is doing
>> as well.
>>
>> Sorry for the long document, normally I just figure these out on my own,
>> but I've been at this a week so far, which has never happened before. If
>> it comes down to black box behavior by XMLBeans, that would help explain
>> it, as I know next to nothing about that library's details.
>>
>> Greg
>>
>>
>> -------- Originale Nachricht --------
>> Von: Greg Woolsey <gr...@gmail.com>
>> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
>> An: POI Developers List <de...@poi.apache.org>
>> Betreff: baffling save behavior
>>
>> I hesitate to even bring it up, thinking I should be able to figure it
>> out,
>> but I've never seen behavior like this.  Keep reading if you want to twist
>> your brain in knots like mine.
>>
>> TL/DR - stop now if you'd rather be productive for your day job.
>>
>> Somehow, I have a consistent state where a file originating in Excel,
>> modified through POI, then saved and opened in Excel causes not just an
>> Excel repair, but incorrect/old data as well.
>>
>> So far, just a coding error, I figure.  But the weirdness starts when I
>> use
>> the debugger to check on things.  A specific row that I know comes out
>> wrong has the right cell values when I use
>> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
>> and CTCell objects, then I find the old values still.  So somehow, I've
>> got
>> the XSSFRow._cells collection out of sync with the CTRow _row contents.
>>
>> Anyone know how I could have done that?
>>
>> I may be calling workbook.write() in multiple places for different
>> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite() explicitly
>> replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
>> but it is replacing the CTCell references in the XSSFCell objects with the
>> new copies, .
>>
>> I haven't found a smoking gun yet, but any thoughts are welcome.  Perhaps
>> the issue may lie with the formula evaluator created prior to the save
>> event?
>>
>> It may be related to that bug, or even directly tied to it, as that was
>> about updating tables, and that's part of the updates my process is doing
>> as well.
>>
>> Sorry for the long document, normally I just figure these out on my own,
>> but I've been at this a week so far, which has never happened before.  If
>> it comes down to black box behavior by XMLBeans, that would help explain
>> it, as I know next to nothing about that library's details.
>>
>> Greg
>>
>

Re: baffling save behavior

Posted by Greg Woolsey <gr...@gmail.com>.
For what it's worth, I'm using a build from last August.  The code changes
I'm seeing since then are almost entirely about API changes to use Enums
and ints and remove deprecated methods.

I'm suspicious there are not a lot of use cases where a document is opened,
edited, saved, edited again without re-opening, then saved again.  That's
where I'm seeing the sync issues, after the first save.  Then some
subsequent edits get lost.  it appears to be for cells that did not exist
originally, then were added by edits.  After the first save, edits to these
cells are not passed on to the CTRow, causing the second save to contain
incorrect data.  I'm still digging.

On Fri, Feb 23, 2018 at 11:09 AM Alain FAGOT BÉAREZ <AB...@for-scala.it>
wrote:

> Hi,
>
> It might have been the same symptoms which have led Sandeep Tiwari to make
> a code change that I was reluctant to accept in the branch about charts in
> XWPF. His explanations were in the line of yours. But I couldn't not accept
> the idea that such a fundamental feature had been broken.
>
> I hope to have some time this weekend to dive into the commits history to
> point out any change done recently that might affect that.
>
> Best regards,
> Alain FAGOT BÉAREZ
>
> ⁣
>
> Von: Greg Woolsey
> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> An: POI Developers List
> Betreff: baffling save behavior
>
> I hesitate to even bring it up, thinking I should be able to figure it out,
> but I've never seen behavior like this. Keep reading if you want to twist
> your brain in knots like mine.
>
> TL/DR - stop now if you'd rather be productive for your day job.
>
> Somehow, I have a consistent state where a file originating in Excel,
> modified through POI, then saved and opened in Excel causes not just an
> Excel repair, but incorrect/old data as well.
>
> So far, just a coding error, I figure. But the weirdness starts when I use
> the debugger to check on things. A specific row that I know comes out
> wrong has the right cell values when I use
> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
> and CTCell objects, then I find the old values still. So somehow, I've got
> the XSSFRow._cells collection out of sync with the CTRow _row contents.
>
> Anyone know how I could have done that?
>
> I may be calling workbook.write() in multiple places for different
> purposes/copies. It looks to me like XSSFRow.onDocumentWrite() explicitly
> replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
> but it is replacing the CTCell references in the XSSFCell objects with the
> new copies, .
>
> I haven't found a smoking gun yet, but any thoughts are welcome. Perhaps
> the issue may lie with the formula evaluator created prior to the save
> event?
>
> It may be related to that bug, or even directly tied to it, as that was
> about updating tables, and that's part of the updates my process is doing
> as well.
>
> Sorry for the long document, normally I just figure these out on my own,
> but I've been at this a week so far, which has never happened before. If
> it comes down to black box behavior by XMLBeans, that would help explain
> it, as I know next to nothing about that library's details.
>
> Greg
>
>
> -------- Originale Nachricht --------
> Von: Greg Woolsey <gr...@gmail.com>
> Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
> An: POI Developers List <de...@poi.apache.org>
> Betreff: baffling save behavior
>
> I hesitate to even bring it up, thinking I should be able to figure it out,
> but I've never seen behavior like this.  Keep reading if you want to twist
> your brain in knots like mine.
>
> TL/DR - stop now if you'd rather be productive for your day job.
>
> Somehow, I have a consistent state where a file originating in Excel,
> modified through POI, then saved and opened in Excel causes not just an
> Excel repair, but incorrect/old data as well.
>
> So far, just a coding error, I figure.  But the weirdness starts when I use
> the debugger to check on things.  A specific row that I know comes out
> wrong has the right cell values when I use
> wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
> and CTCell objects, then I find the old values still.  So somehow, I've got
> the XSSFRow._cells collection out of sync with the CTRow _row contents.
>
> Anyone know how I could have done that?
>
> I may be calling workbook.write() in multiple places for different
> purposes/copies.  It looks to me like XSSFRow.onDocumentWrite() explicitly
> replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
> but it is replacing the CTCell references in the XSSFCell objects with the
> new copies, .
>
> I haven't found a smoking gun yet, but any thoughts are welcome.  Perhaps
> the issue may lie with the formula evaluator created prior to the save
> event?
>
> It may be related to that bug, or even directly tied to it, as that was
> about updating tables, and that's part of the updates my process is doing
> as well.
>
> Sorry for the long document, normally I just figure these out on my own,
> but I've been at this a week so far, which has never happened before.  If
> it comes down to black box behavior by XMLBeans, that would help explain
> it, as I know next to nothing about that library's details.
>
> Greg
>

Re: baffling save behavior

Posted by Alain FAGOT BÉAREZ <AB...@FOR-Scala.IT>.
Hi, 

It might have been the same symptoms which have led Sandeep Tiwari to make a code change that I was reluctant to accept in the branch about charts in XWPF. His explanations were in the line of yours. But I couldn't not accept the idea that such a fundamental feature had been broken. 

I hope to have some time this weekend to dive into the commits history to point out any change done recently that might affect that. 

Best regards, 
Alain FAGOT BÉAREZ 

⁣

Von: Greg Woolsey 
Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
An: POI Developers List 
Betreff: baffling save behavior

I hesitate to even bring it up, thinking I should be able to figure it out,
but I've never seen behavior like this. Keep reading if you want to twist
your brain in knots like mine.

TL/DR - stop now if you'd rather be productive for your day job.

Somehow, I have a consistent state where a file originating in Excel,
modified through POI, then saved and opened in Excel causes not just an
Excel repair, but incorrect/old data as well.

So far, just a coding error, I figure. But the weirdness starts when I use
the debugger to check on things. A specific row that I know comes out
wrong has the right cell values when I use
wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
and CTCell objects, then I find the old values still. So somehow, I've got
the XSSFRow._cells collection out of sync with the CTRow _row contents.

Anyone know how I could have done that?

I may be calling workbook.write() in multiple places for different
purposes/copies. It looks to me like XSSFRow.onDocumentWrite() explicitly
replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
but it is replacing the CTCell references in the XSSFCell objects with the
new copies, .

I haven't found a smoking gun yet, but any thoughts are welcome. Perhaps
the issue may lie with the formula evaluator created prior to the save
event?

It may be related to that bug, or even directly tied to it, as that was
about updating tables, and that's part of the updates my process is doing
as well.

Sorry for the long document, normally I just figure these out on my own,
but I've been at this a week so far, which has never happened before. If
it comes down to black box behavior by XMLBeans, that would help explain
it, as I know next to nothing about that library's details.

Greg


-------- Originale Nachricht --------
Von: Greg Woolsey <gr...@gmail.com>
Gesendet: Fri Feb 23 03:53:08 GMT-03:00 2018
An: POI Developers List <de...@poi.apache.org>
Betreff: baffling save behavior

I hesitate to even bring it up, thinking I should be able to figure it out,
but I've never seen behavior like this.  Keep reading if you want to twist
your brain in knots like mine.

TL/DR - stop now if you'd rather be productive for your day job.

Somehow, I have a consistent state where a file originating in Excel,
modified through POI, then saved and opened in Excel causes not just an
Excel repair, but incorrect/old data as well.

So far, just a coding error, I figure.  But the weirdness starts when I use
the debugger to check on things.  A specific row that I know comes out
wrong has the right cell values when I use
wb.getSheet(#).getRow(#).getCell(#).toString(), but when I check the CTRow
and CTCell objects, then I find the old values still.  So somehow, I've got
the XSSFRow._cells collection out of sync with the CTRow _row contents.

Anyone know how I could have done that?

I may be calling workbook.write() in multiple places for different
purposes/copies.  It looks to me like XSSFRow.onDocumentWrite() explicitly
replaces the CTRow array of CTCell beans with new ones due to bug, #56170,
but it is replacing the CTCell references in the XSSFCell objects with the
new copies, .

I haven't found a smoking gun yet, but any thoughts are welcome.  Perhaps
the issue may lie with the formula evaluator created prior to the save
event?

It may be related to that bug, or even directly tied to it, as that was
about updating tables, and that's part of the updates my process is doing
as well.

Sorry for the long document, normally I just figure these out on my own,
but I've been at this a week so far, which has never happened before.  If
it comes down to black box behavior by XMLBeans, that would help explain
it, as I know next to nothing about that library's details.

Greg