You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@struts.apache.org by Mark Takacs <ta...@coscend.com> on 2001/08/04 01:43:14 UTC

File Upload - \n inserts in DiskMultiPart

  We ran into this bug when using Struts uploads on a Microsoft Excel 
spreadsheet that we are trying to run a conversion utility (xlHtml) on. 
  The Excel conversion was failing when done via upload, but not when 
done via commmand line. After some sleuthing, it looks like struts is 
clobbering the last char of  the web.xml bufferSize with a "\n" char.  

I added some details to an existing bug covering this problem.  


-tak

------

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2503

Using struts to handle uploaded files, if the files contain lines > 4k (or the 
file is binary), the file data gets \n characters inserted at the 4k boundaries 
of the long lines.



/------- Additional Comments From Tak <ma...@pacbell.net> 
2001-08-03 16:33 -------/

This seems very clear-cut. At least the error, in any case. I'm looking at the
1.0 final codebase.

The 4096 limit comes directly from the (default) value in web.xml

        <init-param>
            <param-name>bufferSize</param-name>
            <param-value>4096</param-value>
        </init-param>

Uploading a (binary) file and doing a cmp results in the following:

cmp -l /tmp/strts27203.tmp ~/myBinaryFileTest.xls
 Byte#  Oct Oct
  4097  12  57
  8194  12 144
 12291  12 142
 16388  12 145
 20485  12 156
 24582  12 147
 28679  12  55
 71584  12   0 

This sez that every 4096 bytes, a linefeed (Octal 12) is being inserted instead
of the various original data (last column)

upload.DiskMultipartRequestHandler.java seems like a good place to start.  It pulls the
value (4096) out of the config file and (?) cuts up the input into bufferSize'd
chunks which are stored in a Hashtable.  

Whatever is writing that hashtable back to disk is replacing the last char of
each hashtable value with a \n and writing the file out.  Or maybe the First char of the next block? 

I'm hoping that bumping the config file value of bufferSize is an acceptable
workaround for now...


--------


Re: File Upload - \n inserts in DiskMultiPart

Posted by Mark Takacs <ta...@coscend.com>.
I found an (ugly) workaround for this problem.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2503

Index: MultipartIterator.java
===================================================================
RCS file:
/home/cvspublic/jakarta-struts/src/share/org/apache/struts/upload/MultipartIterator.java,v
retrieving revision 1.13.2.1
diff -c -2 -r1.13.2.1 MultipartIterator.java
*** MultipartIterator.java      2001/06/14 01:11:28     1.13.2.1
--- MultipartIterator.java      2001/08/06 18:53:06
***************
*** 34,40 ****
 
      /**
!      * The maximum size in bytes of the buffer used to read lines [4K]
       */
!     public static int MAX_LINE_SIZE = 4096;
 
      /**
--- 34,40 ----
 
      /**
!      * The maximum size in bytes of the buffer used to read lines [64K]
       */
!     public static int MAX_LINE_SIZE = 65536;
 
      /**  


This isn't really a fix, as it just avoids the buggy codepath in 
 MultipartIterator.createLocalFile().   Here's the comment I put in our 
local (hacked) version.

+ // The code for this cutCarriage and cutNewline
+ // mangles binary files where the "line length" is greater
+ // than the MAX_LINE_SIZE.  Note that not all binary files
+ // have lines longer than MAX_LINE_SIZE -- most jpgs (for
+ // instance) consist of many small 'lines' of binary data,
+ // which avoids the cutXX codepath.  Microsoft Excel
+ // spreadsheets, on the other hand, contain HUGE single line
+ // blobs of data (29k), triggering the strange cuts.                            
+ // 
+ // http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2503


If someone a bit more parser-savvy could take a look at the block of 
parser code that sniffs out newlines and tries to remove/add them, that 
would be a real fix.  Do you even need to do that for binary files?   
Maybe curCarriage/cutNewline should be completely skipped for binary files..

-tak