You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Gianugo Rabellino <gi...@apache.org> on 2003/07/31 16:35:16 UTC

Flow's processPipelineTo() and FileSource

I'm having a hell of a time using flow with processPipelineTo() and 
OutputStreams coming out from FileSource(s).

The problem is that FileSource#getOutputStream() creates a temporary 
file (... to be discussed later ...) and such file gets renamed to the 
original one only upon OutputStream.close(). Now, AbstractInterpreter, 
line 201, actually calls flush() but *never* close. As a result, 
everything is kinda ... well... screwed up.

Patch is trivial, but I'm wondering if adding out.close() in 
AbstractInterpreter.java might break something: any flow experts around?

Now for the FileSource: I do understand *some* of the reasoning behind 
using a temporary file, but I have to disagree on the implementation: 
naming it [filename].tmp is a bit of a bet, since someone might 
legitimately have such a filename around. While I understand that there 
might be memory issues with large files, I guess that either:

1. keeping a ByteArrayOutputStream;
2. forget about it and just write the file;
3. use a more "clever" name that doesn't risk conflicts this much

are all better options.

Is that OK to you if I work on it? I don't know if I have access to the 
Excalibur CVS though...

Ciao,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)

Re: Flow's processPipelineTo() and FileSource

Posted by Gianugo Rabellino <gi...@apache.org>.

Sylvain Wallez wrote:

>> I'm having a hell of a time using flow with processPipelineTo() and 
>> OutputStreams coming out from FileSource(s).
>>
>> The problem is that FileSource#getOutputStream() creates a temporary 
>> file (... to be discussed later ...) and such file gets renamed to the 
>> original one only upon OutputStream.close(). Now, AbstractInterpreter, 
>> line 201, actually calls flush() but *never* close. As a result, 
>> everything is kinda ... well... screwed up.
>>
>> Patch is trivial, but I'm wondering if adding out.close() in 
>> AbstractInterpreter.java might break something: any flow experts around? 
> 
> I don't see why there should be some consequences on the flow itself... 
> Just replace flush() by close() !

Just did it, but I didn't replace flush(), just added close() 
afterwards: it's better to be sure that there are no leftovers...

>> Now for the FileSource: I do understand *some* of the reasoning behind 
>> using a temporary file, but I have to disagree on the implementation: 
>> naming it [filename].tmp is a bit of a bet, since someone might 
>> legitimately have such a filename around. While I understand that 
>> there might be memory issues with large files, I guess that either:
>>
>> 1. keeping a ByteArrayOutputStream;
>> 2. forget about it and just write the file;
>> 3. use a more "clever" name that doesn't risk conflicts this much 
> 
> 
> 
> I would avoid 2. The reason why I used a temporary file is because of 
> the streamed nature of Cocoon pipelines. If an error occurs within the 
> processing, the original content is not partially overwritten. My 
> preference would go to 3.

I see and understand. Yet temporary files, besides being somehow 
inconvenient, can be a major security hole in general. I'd rather go for 
1, then, accumulating bytes as they come on a ByteArrayOutputStream and 
writing them upon close() (and maybe flush() too?). True, this is in 
turn a possible security hole since someone might DOS the machine by 
processing gigabyte-sized files, but all in all I tend to think that 
it's a better solution... and yes, doing transaction on a filesystem is 
a PITA. :-)

Ciao,

>> are all better options.
>>
>> Is that OK to you if I work on it? I don't know if I have access to 
>> the Excalibur CVS though... 
> 
> As a Cocoon committer, you should.

I understand that I am authorized in line of principle, just don't know 
if I need to be explicitely enabled. Anyway, I'll check it out. :-)

Thanks for everything,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)

Re: Flow's processPipelineTo() and FileSource

Posted by Gianugo Rabellino <gi...@apache.org>.

Christopher Oliver wrote:
> Why can't you just call "close()" in your flowscript? I don't see 
> anything in the contract of processPipelineTo() that indicates that it 
> should close the stream. In my opinion, calling "flush()" but not 
> "close()" as in the original implementation is correct.

On second thought, yes, you're right. The OutputStream might be reused 
later, and if you call close() in the flow you miss that opportunity. 
I'm reverting the patch right now: next on the list is finding out why 
close() in flowscript was giving me problems, while putting it there 
solved everything. Probably something wrong in FileSource.

Thanks,

-- 
Gianugo Rabellino
Pro-netics s.r.l. -  http://www.pro-netics.com
Orixo, the XML business alliance - http://www.orixo.com
     (Now blogging at: http://blogs.cocoondev.org/gianugo/)

Re: Flow's processPipelineTo() and FileSource

Posted by Christopher Oliver <re...@verizon.net>.

Why can't you just call "close()" in your flowscript? I don't see 
anything in the contract of processPipelineTo() that indicates that it 
should close the stream. In my opinion, calling "flush()" but not 
"close()" as in the original implementation is correct.

My $0.02,

Chris

Sylvain Wallez wrote:

> Gianugo Rabellino wrote:
>
>> I'm having a hell of a time using flow with processPipelineTo() and 
>> OutputStreams coming out from FileSource(s).
>>
>> The problem is that FileSource#getOutputStream() creates a temporary 
>> file (... to be discussed later ...) and such file gets renamed to 
>> the original one only upon OutputStream.close(). Now, 
>> AbstractInterpreter, line 201, actually calls flush() but *never* 
>> close. As a result, everything is kinda ... well... screwed up.
>>
>> Patch is trivial, but I'm wondering if adding out.close() in 
>> AbstractInterpreter.java might break something: any flow experts around? 
>
>
>
> I don't see why there should be some consequences on the flow 
> itself... Just replace flush() by close() !
>
>> Now for the FileSource: I do understand *some* of the reasoning 
>> behind using a temporary file, but I have to disagree on the 
>> implementation: naming it [filename].tmp is a bit of a bet, since 
>> someone might legitimately have such a filename around. While I 
>> understand that there might be memory issues with large files, I 
>> guess that either:
>>
>> 1. keeping a ByteArrayOutputStream;
>> 2. forget about it and just write the file;
>> 3. use a more "clever" name that doesn't risk conflicts this much 
>
>
>
> I would avoid 2. The reason why I used a temporary file is because of 
> the streamed nature of Cocoon pipelines. If an error occurs within the 
> processing, the original content is not partially overwritten. My 
> preference would go to 3.
>
>> are all better options.
>>
>> Is that OK to you if I work on it? I don't know if I have access to 
>> the Excalibur CVS though... 
>
>
>
> As a Cocoon committer, you should.
>
> Sylvain
>

Re: Flow's processPipelineTo() and FileSource

Posted by Sylvain Wallez <sy...@anyware-tech.com>.

Gianugo Rabellino wrote:

> I'm having a hell of a time using flow with processPipelineTo() and 
> OutputStreams coming out from FileSource(s).
>
> The problem is that FileSource#getOutputStream() creates a temporary 
> file (... to be discussed later ...) and such file gets renamed to the 
> original one only upon OutputStream.close(). Now, AbstractInterpreter, 
> line 201, actually calls flush() but *never* close. As a result, 
> everything is kinda ... well... screwed up.
>
> Patch is trivial, but I'm wondering if adding out.close() in 
> AbstractInterpreter.java might break something: any flow experts around? 


I don't see why there should be some consequences on the flow itself... 
Just replace flush() by close() !

> Now for the FileSource: I do understand *some* of the reasoning behind 
> using a temporary file, but I have to disagree on the implementation: 
> naming it [filename].tmp is a bit of a bet, since someone might 
> legitimately have such a filename around. While I understand that 
> there might be memory issues with large files, I guess that either:
>
> 1. keeping a ByteArrayOutputStream;
> 2. forget about it and just write the file;
> 3. use a more "clever" name that doesn't risk conflicts this much 


I would avoid 2. The reason why I used a temporary file is because of 
the streamed nature of Cocoon pipelines. If an error occurs within the 
processing, the original content is not partially overwritten. My 
preference would go to 3.

> are all better options.
>
> Is that OK to you if I work on it? I don't know if I have access to 
> the Excalibur CVS though... 


As a Cocoon committer, you should.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com