You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Ivan Zhakov <iv...@visualsvn.com> on 2019/08/24 14:39:17 UTC

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <iv...@visualsvn.com> wrote:
>
> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <iv...@visualsvn.com> wrote:
> >
> > On Wed, 3 Jul 2019 at 18:37, Joe Orton <jo...@redhat.com> wrote:
> > >
> > > On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
> > > > They also make the existing old API unusable for many cases thus
> > > > making a switch to the new API mandatory, even though it doesn't have
> > > > to be so (see below).
> > > >
> > > > APR 1.x did not specify that the result of apr_dir_read() has to be allocated
> > > > in dir->pool:
> > > >
> > > >   https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
> > > >
> > > > This function does not accept a pool argument, and I think that it's natural
> > > > to assume limited lifetime in such case. This is what the current major APR
> > > > users (httpd, svn) do, and other users should probably be ready for that too,
> > > > since on Win32 the subsequent calls to apr_dir_read() already invalidate the
> > > > previous apr_finfo_t.
> > >
> > > Are there many examples in the APR API where returned strings are not
> > > either pool-allocated or constant (process lifetime)?  It seems unusual
> > > and very surprising to me.
> > >
> > > A hypothetical API consumer can have made either assumption:
> > >
> > > a) of constant memory (true with limits on Win32, breaks for Unix), or
> > > b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
> > >
> > > neither is documented and both are broken by current implementations. It
> > > is impossible to satisfy both so anybody can argue that fixing either
> > > implementation to match the other is a regression.
> > >
> > > Doubling down on (a) over (b) seems strongly inferior to me, the API
> > > cannot support this without introducing runtime overhead for all callers
> > > (a new iterpool in the dir object), so I'd rather fix this properly with
> > > a new API.
> > >
> > > I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
> > > the same as introducing _pread() if desired - making the strdup only
> > > happen for _pread would only be a minor tweak, not a whole new subpool.
> > >
> >
> > There is example of apr_hash_t.ITERATOR that is statically allocated in
> > the hash object and returned to the caller.
> >
> > Also the native readdir() API behaves the same way [1]:
> > [[[
> > The returned pointer, and pointers within the structure, might be invalidated
> > or the structure or the storage areas might be overwritten by a subsequent call
> > to readdir() on the same directory stream.
> > ]]]
> >
> > From this perspective the current behavior of apr_dir_read on Unix looks
> > inconsistent and surprising for the API user because it is impossible to use
> > in a loop without unbounded memory usage.
> >
> > I strongly agree that the revved version of this function that accepts a POOL
> > argument will be good from both usage and implementation terms. I would be +1
> > on adding such API. But I also think that unless we fix the original issue
> > by using the preallocated buffer/iterpool, the whole thing would still be
> > problematic for any existing API user.
> >
> > More detailed: if we take this opportunity to choose and document the API to
> > consistently return a temporary result, then as the caller I can either process
> > the data during iteration or manually put it into a long-living pool if that is
> > necessary. I think that all current callers work with the API this way. And I
> > can also switch to the new revved function to manually specify the pool and
> > better control the allocations. If I would like to support both APR 1.x and 2.x
> > the compatibility is also straightforward to achieve.
> >
> > If we instead choose an option where the results may be allocated
> > in the dir object pool, then as the caller I cannot avoid unbounded memory
> > usage when supporting both APR 1.x and 2.x. And this approach also introduces
> > the unavoidable the unbounded memory usage for all Win32 users that
> > do not update their code.
> >
> > [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
> >
> Is there any feedback or thoughts on these proposed changes?
>
> Thanks!
>

For what it's worth, I'm -1 on the changes done in r1862071 and
r1862435 for the reasons listed above.

PS: No idea if I can actually veto changes, because while I'm a
committer on the APR project, I am not in its PMC. But "-1" properly
expresses my technical opinion on this topic.

--
Ivan Zhakov

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, 28 Aug 2019 at 10:08, Joe Orton <jo...@redhat.com> wrote:
>
> On Sat, Aug 24, 2019 at 05:39:17PM +0300, Ivan Zhakov wrote:
> > For what it's worth, I'm -1 on the changes done in r1862071 and
> > r1862435 for the reasons listed above.
> >
> > PS: No idea if I can actually veto changes, because while I'm a
> > committer on the APR project, I am not in its PMC. But "-1" properly
> > expresses my technical opinion on this topic.
>
> OK fair enough, Ivan - reverted in r1866019 and please go ahead with the
> iterpool-based proposal - thanks a lot for reviewing this.
>
Thanks!

I'm currently busy with other thing, but I'll add this problem to my
TODO list and will try to fix it later.

-- 
Ivan Zhakov

Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Posted by Joe Orton <jo...@redhat.com>.
On Sat, Aug 24, 2019 at 05:39:17PM +0300, Ivan Zhakov wrote:
> For what it's worth, I'm -1 on the changes done in r1862071 and
> r1862435 for the reasons listed above.
> 
> PS: No idea if I can actually veto changes, because while I'm a
> committer on the APR project, I am not in its PMC. But "-1" properly
> expresses my technical opinion on this topic.

OK fair enough, Ivan - reverted in r1866019 and please go ahead with the 
iterpool-based proposal - thanks a lot for reviewing this.

Regards, Joe



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Posted by Branko Čibej <br...@apache.org>.
On 27.08.2019 11:25, Ruediger Pluem wrote:
>
> On 08/25/2019 04:04 PM, Branko Čibej wrote:
>> On 24.08.2019 16:39, Ivan Zhakov wrote:
>>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jo...@redhat.com> wrote:
>>>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>>>>>> They also make the existing old API unusable for many cases thus
>>>>>>> making a switch to the new API mandatory, even though it doesn't have
>>>>>>> to be so (see below).
>>>>>>>
>>>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be allocated
>>>>>>> in dir->pool:
>>>>>>>
>>>>>>>   https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>>>>>
>>>>>>> This function does not accept a pool argument, and I think that it's natural
>>>>>>> to assume limited lifetime in such case. This is what the current major APR
>>>>>>> users (httpd, svn) do, and other users should probably be ready for that too,
>>>>>>> since on Win32 the subsequent calls to apr_dir_read() already invalidate the
>>>>>>> previous apr_finfo_t.
>>>>>> Are there many examples in the APR API where returned strings are not
>>>>>> either pool-allocated or constant (process lifetime)?  It seems unusual
>>>>>> and very surprising to me.
>>>>>>
>>>>>> A hypothetical API consumer can have made either assumption:
>>>>>>
>>>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>>>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>>>>>
>>>>>> neither is documented and both are broken by current implementations. It
>>>>>> is impossible to satisfy both so anybody can argue that fixing either
>>>>>> implementation to match the other is a regression.
>>>>>>
>>>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>>>>>> cannot support this without introducing runtime overhead for all callers
>>>>>> (a new iterpool in the dir object), so I'd rather fix this properly with
>>>>>> a new API.
>>>>>>
>>>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>>>>>> the same as introducing _pread() if desired - making the strdup only
>>>>>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>>>>>
>>>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>>>> the hash object and returned to the caller.
>>>>>
>>>>> Also the native readdir() API behaves the same way [1]:
>>>>> [[[
>>>>> The returned pointer, and pointers within the structure, might be invalidated
>>>>> or the structure or the storage areas might be overwritten by a subsequent call
>>>>> to readdir() on the same directory stream.
>>>>> ]]]
>>>>>
>>>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>>>> inconsistent and surprising for the API user because it is impossible to use
>>>>> in a loop without unbounded memory usage.
>>>>>
>>>>> I strongly agree that the revved version of this function that accepts a POOL
>>>>> argument will be good from both usage and implementation terms. I would be +1
>>>>> on adding such API. But I also think that unless we fix the original issue
>>>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>>>> problematic for any existing API user.
>>>>>
>>>>> More detailed: if we take this opportunity to choose and document the API to
>>>>> consistently return a temporary result, then as the caller I can either process
>>>>> the data during iteration or manually put it into a long-living pool if that is
>>>>> necessary. I think that all current callers work with the API this way. And I
>>>>> can also switch to the new revved function to manually specify the pool and
>>>>> better control the allocations. If I would like to support both APR 1.x and 2.x
>>>>> the compatibility is also straightforward to achieve.
>>>>>
>>>>> If we instead choose an option where the results may be allocated
>>>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>>>> usage when supporting both APR 1.x and 2.x. And this approach also introduces
>>>>> the unavoidable the unbounded memory usage for all Win32 users that
>>>>> do not update their code.
>>>>>
>>>>> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>>>
>>>> Is there any feedback or thoughts on these proposed changes?
>>>>
>>>> Thanks!
>>>>
>>> For what it's worth, I'm -1 on the changes done in r1862071 and
>>> r1862435 for the reasons listed above.
>>>
>>> PS: No idea if I can actually veto changes, because while I'm a
>>> committer on the APR project, I am not in its PMC. But "-1" properly
>>> expresses my technical opinion on this topic.
>>
>> I agree with your analysis. I think it would be best if you just
>> implemented your proposal, including the revised version of the API. The
>> unbounded-memory case you describe is pretty much a showstopper. The
>> allocation choices do have to be documented better, IMO.
>>
>> That would be trunk and 1.7.x, although we can't add a revised API in
>> the 1.7.x line since it's released already. For compatibility it would
>> probably be best to leave apr_dir_read on trunk without the extra pool
>> parameter.
> The API that allows to specify a pool can only be introduced with 1.8 / trunk, but what about the current API which
> needs to stay for 1.8? Either way we fix the code we might break applications. If we use the Windows approach on Unix
> you break applications that rely on getting a copy back, if we use the Unix approach we have an unbounded-memory issue.
> Do you see any other option but choosing one implementation (Windows or Unix) for the other case as well and place
> a fat note about the changed behavior somewhere (Release notes / Change log)?


The way stuff is allocated was only documented a couple months ago on
trunk. Fix those docs and that's it. Anyone who previously relied on
getting a copy was looking at the implementation, not the documentation.

This is release notes/API errata territory.

-- Brane


Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Posted by Ruediger Pluem <rp...@apache.org>.

On 08/25/2019 04:04 PM, Branko Čibej wrote:
> On 24.08.2019 16:39, Ivan Zhakov wrote:
>> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jo...@redhat.com> wrote:
>>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>>>>> They also make the existing old API unusable for many cases thus
>>>>>> making a switch to the new API mandatory, even though it doesn't have
>>>>>> to be so (see below).
>>>>>>
>>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be allocated
>>>>>> in dir->pool:
>>>>>>
>>>>>>   https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>>>>
>>>>>> This function does not accept a pool argument, and I think that it's natural
>>>>>> to assume limited lifetime in such case. This is what the current major APR
>>>>>> users (httpd, svn) do, and other users should probably be ready for that too,
>>>>>> since on Win32 the subsequent calls to apr_dir_read() already invalidate the
>>>>>> previous apr_finfo_t.
>>>>> Are there many examples in the APR API where returned strings are not
>>>>> either pool-allocated or constant (process lifetime)?  It seems unusual
>>>>> and very surprising to me.
>>>>>
>>>>> A hypothetical API consumer can have made either assumption:
>>>>>
>>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>>>>
>>>>> neither is documented and both are broken by current implementations. It
>>>>> is impossible to satisfy both so anybody can argue that fixing either
>>>>> implementation to match the other is a regression.
>>>>>
>>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>>>>> cannot support this without introducing runtime overhead for all callers
>>>>> (a new iterpool in the dir object), so I'd rather fix this properly with
>>>>> a new API.
>>>>>
>>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>>>>> the same as introducing _pread() if desired - making the strdup only
>>>>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>>>>
>>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>>> the hash object and returned to the caller.
>>>>
>>>> Also the native readdir() API behaves the same way [1]:
>>>> [[[
>>>> The returned pointer, and pointers within the structure, might be invalidated
>>>> or the structure or the storage areas might be overwritten by a subsequent call
>>>> to readdir() on the same directory stream.
>>>> ]]]
>>>>
>>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>>> inconsistent and surprising for the API user because it is impossible to use
>>>> in a loop without unbounded memory usage.
>>>>
>>>> I strongly agree that the revved version of this function that accepts a POOL
>>>> argument will be good from both usage and implementation terms. I would be +1
>>>> on adding such API. But I also think that unless we fix the original issue
>>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>>> problematic for any existing API user.
>>>>
>>>> More detailed: if we take this opportunity to choose and document the API to
>>>> consistently return a temporary result, then as the caller I can either process
>>>> the data during iteration or manually put it into a long-living pool if that is
>>>> necessary. I think that all current callers work with the API this way. And I
>>>> can also switch to the new revved function to manually specify the pool and
>>>> better control the allocations. If I would like to support both APR 1.x and 2.x
>>>> the compatibility is also straightforward to achieve.
>>>>
>>>> If we instead choose an option where the results may be allocated
>>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>>> usage when supporting both APR 1.x and 2.x. And this approach also introduces
>>>> the unavoidable the unbounded memory usage for all Win32 users that
>>>> do not update their code.
>>>>
>>>> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>>
>>> Is there any feedback or thoughts on these proposed changes?
>>>
>>> Thanks!
>>>
>> For what it's worth, I'm -1 on the changes done in r1862071 and
>> r1862435 for the reasons listed above.
>>
>> PS: No idea if I can actually veto changes, because while I'm a
>> committer on the APR project, I am not in its PMC. But "-1" properly
>> expresses my technical opinion on this topic.
> 
> 
> I agree with your analysis. I think it would be best if you just
> implemented your proposal, including the revised version of the API. The
> unbounded-memory case you describe is pretty much a showstopper. The
> allocation choices do have to be documented better, IMO.
> 
> That would be trunk and 1.7.x, although we can't add a revised API in
> the 1.7.x line since it's released already. For compatibility it would
> probably be best to leave apr_dir_read on trunk without the extra pool
> parameter.

The API that allows to specify a pool can only be introduced with 1.8 / trunk, but what about the current API which
needs to stay for 1.8? Either way we fix the code we might break applications. If we use the Windows approach on Unix
you break applications that rely on getting a copy back, if we use the Unix approach we have an unbounded-memory issue.
Do you see any other option but choosing one implementation (Windows or Unix) for the other case as well and place
a fat note about the changed behavior somewhere (Release notes / Change log)?

Regards

Rüdiger



Re: svn commit: r1862071 - in /apr/apr/trunk: file_io/os2/dir.c file_io/unix/dir.c file_io/win32/dir.c include/apr_file_info.h test/testdir.c

Posted by Branko Čibej <br...@apache.org>.
On 24.08.2019 16:39, Ivan Zhakov wrote:
> On Thu, 25 Jul 2019 at 15:58, Ivan Zhakov <iv...@visualsvn.com> wrote:
>> On Mon, 8 Jul 2019 at 20:05, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>> On Wed, 3 Jul 2019 at 18:37, Joe Orton <jo...@redhat.com> wrote:
>>>> On Wed, Jul 03, 2019 at 02:43:02PM +0300, Ivan Zhakov wrote:
>>>>> They also make the existing old API unusable for many cases thus
>>>>> making a switch to the new API mandatory, even though it doesn't have
>>>>> to be so (see below).
>>>>>
>>>>> APR 1.x did not specify that the result of apr_dir_read() has to be allocated
>>>>> in dir->pool:
>>>>>
>>>>>   https://apr.apache.org/docs/apr/1.7/group__apr__dir.html#ga3e4ee253e0c712160bee10bfb9c8e4a8
>>>>>
>>>>> This function does not accept a pool argument, and I think that it's natural
>>>>> to assume limited lifetime in such case. This is what the current major APR
>>>>> users (httpd, svn) do, and other users should probably be ready for that too,
>>>>> since on Win32 the subsequent calls to apr_dir_read() already invalidate the
>>>>> previous apr_finfo_t.
>>>> Are there many examples in the APR API where returned strings are not
>>>> either pool-allocated or constant (process lifetime)?  It seems unusual
>>>> and very surprising to me.
>>>>
>>>> A hypothetical API consumer can have made either assumption:
>>>>
>>>> a) of constant memory (true with limits on Win32, breaks for Unix), or
>>>> b) of pool-allocated string lifetime (works for Unix, breaks for Win32)
>>>>
>>>> neither is documented and both are broken by current implementations. It
>>>> is impossible to satisfy both so anybody can argue that fixing either
>>>> implementation to match the other is a regression.
>>>>
>>>> Doubling down on (a) over (b) seems strongly inferior to me, the API
>>>> cannot support this without introducing runtime overhead for all callers
>>>> (a new iterpool in the dir object), so I'd rather fix this properly with
>>>> a new API.
>>>>
>>>> I'd be fine with leaving Win32 apr_dir_read() behaviour as-is for 1.x at
>>>> the same as introducing _pread() if desired - making the strdup only
>>>> happen for _pread would only be a minor tweak, not a whole new subpool.
>>>>
>>> There is example of apr_hash_t.ITERATOR that is statically allocated in
>>> the hash object and returned to the caller.
>>>
>>> Also the native readdir() API behaves the same way [1]:
>>> [[[
>>> The returned pointer, and pointers within the structure, might be invalidated
>>> or the structure or the storage areas might be overwritten by a subsequent call
>>> to readdir() on the same directory stream.
>>> ]]]
>>>
>>> From this perspective the current behavior of apr_dir_read on Unix looks
>>> inconsistent and surprising for the API user because it is impossible to use
>>> in a loop without unbounded memory usage.
>>>
>>> I strongly agree that the revved version of this function that accepts a POOL
>>> argument will be good from both usage and implementation terms. I would be +1
>>> on adding such API. But I also think that unless we fix the original issue
>>> by using the preallocated buffer/iterpool, the whole thing would still be
>>> problematic for any existing API user.
>>>
>>> More detailed: if we take this opportunity to choose and document the API to
>>> consistently return a temporary result, then as the caller I can either process
>>> the data during iteration or manually put it into a long-living pool if that is
>>> necessary. I think that all current callers work with the API this way. And I
>>> can also switch to the new revved function to manually specify the pool and
>>> better control the allocations. If I would like to support both APR 1.x and 2.x
>>> the compatibility is also straightforward to achieve.
>>>
>>> If we instead choose an option where the results may be allocated
>>> in the dir object pool, then as the caller I cannot avoid unbounded memory
>>> usage when supporting both APR 1.x and 2.x. And this approach also introduces
>>> the unavoidable the unbounded memory usage for all Win32 users that
>>> do not update their code.
>>>
>>> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html#tag_16_475
>>>
>> Is there any feedback or thoughts on these proposed changes?
>>
>> Thanks!
>>
> For what it's worth, I'm -1 on the changes done in r1862071 and
> r1862435 for the reasons listed above.
>
> PS: No idea if I can actually veto changes, because while I'm a
> committer on the APR project, I am not in its PMC. But "-1" properly
> expresses my technical opinion on this topic.


I agree with your analysis. I think it would be best if you just
implemented your proposal, including the revised version of the API. The
unbounded-memory case you describe is pretty much a showstopper. The
allocation choices do have to be documented better, IMO.

That would be trunk and 1.7.x, although we can't add a revised API in
the 1.7.x line since it's released already. For compatibility it would
probably be best to leave apr_dir_read on trunk without the extra pool
parameter.

-- Brane