You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by "P. Ottlinger" <po...@apache.org> on 2021/11/10 18:34:59 UTC

[csv] Does the library provide means to circumvent CSV injection

Hi,

I just stumbled upon
https://owasp.org/www-community/attacks/CSV_Injection#
and asked myself if CommonsCSV provides a means to circumvent these kind
of attacks.

If the library handles these special characters and prevents attacks
from working it should be mentioned on the homepage.

If it doesn't handle I'd like to know how customers/users prevent these
kind of attacks. Maybe there's a working solution that can easily be
integrated into CommonsCSV?

Thanks,
Phil


Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Gary Gregory <ga...@gmail.com>.
I agree with Sebb on all his points.

Gary

On Thu, Nov 11, 2021, 18:20 sebb <se...@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 22:00, Bruno P. Kinoshita
> <br...@yahoo.com.br.invalid> wrote:
> >
> >  Maybe we could provide a flag that disables the characters mentioned in
> the OWASP page about CSV Injection [1]
>
> No, please no.
>
> > Normally I suggest going secure by default, but in this case I think the
> flag should be disabled by default, as filtering cells that start with
> "equals to" could break in cases where it was used for a legit formula for
> excel/openoffice/libreoffice/etc.
>
> Or maybe the CSV contains data for another application that requires
> these meta-characters.
> Or maybe there is another application that blows up when accents are used.
> Etc.
>
> It is just not the right place to implement such checks.
>
> > The OWASP page also acknowledges the difficulty of this issue I think:
> > > This attack is difficult to mitigate, and explicitly disallowed
> fromquite a few bug bounty programs.
> > Just my 0.02 cents.
> > Bruno
>
> Yes, it is difficult (if not impossible) to mitigate, especially at
> the level of a component like Commons CSV.
> We don't know the context of how CSV will be used.
>
> Do we even want to take responsibility for guaranteeing that Commons
> CSV prevents all CSV injection attacks?
>
> Even if we were able to release a perfect solution, it would not
> prevent injection attacks via direct access to CSV files, which is
> surely the main vector.
>
> Seems to me we should just update the Javadoc to mention the issue,
> and let the application handle it.
>
> > [1] https://owasp.org/www-community/attacks/CSV_Injection
> >     On Friday, 12 November 2021, 04:29:42 am NZDT, Bernd Eckenfels <
> ecki@zusammenkunft.net> wrote:
> >
> >  Hello,
> >
> > I don’t really agree, a generic CSV tool should have a flag to protect
> against this, since it is a very common requirement. The situation is very
> unfortunate, this is why there is no good solution by default, but I can
> asume many software vendors working in the area of windows based enterprise
> desktops and exporting files with CSV downloads want to enable this.
> >
> > Having said that, not sure if actually quoting is enough and all should
> prefer xml based office formats anyway.
> >
> > I won’t mind to accept a tester patch for such an option. Maybe even
> unsafe-pass-default/quote-injection/reject-injection enum.
> >
> > Gruss
> > Bernd
> >
> >
> > --
> > http://bernd.eckenfels.net
> > ________________________________
> > Von: sebb <se...@gmail.com>
> > Gesendet: Thursday, November 11, 2021 3:42:08 PM
> > An: Commons Users List <us...@commons.apache.org>
> > Cc: Gary Gregory <ga...@gmail.com>; mseil@acm.org <ms...@acm.org>
> > Betreff: Re: [csv] Does the library provide means to circumvent CSV
> injection
> >
> > On Thu, 11 Nov 2021 at 11:36, P. Ottlinger <po...@apache.org>
> wrote:
> > >
> > > Hi guys,
> > >
> > > thanks for your reply.
> > >
> > > Maybe I'm misinterpreting something but I thought that it could be made
> > > possible to configure CSVFormat-object when writing the CSV data in a
> > > way that any data with possibly corrupting values (as shown on the
> OWASP
> > > page) will mask the whole contents of the cell.
> > >
> > > Thus a library such as commons-csv would be able to lower the risk for
> > > CSV injection and not every client/customer would have to manually
> > > create this protecting logic.
> > >
> > > To my mind it's a simple parser for "dangerous" tokens that quotes the
> > > given data with additional &quot; .... as we do not need to write
> > > functioning Excel formulas into CSV.
> > >
> > > WDYT?
> >
> > As the others have said, this is the wrong place to be looking to fix
> > the problem.
> >
> > CSV files are used for lots of things other than spreadsheets, so what
> > is dangerous in one application might be essential in another.
> >
> > Besides, not all CSV files will be processed by Commons CSV on their
> > route to a spreadsheet app.
> >
> > Such checks need to be made at the input to the application that
> processes it.
> >
> > > Cheers,
> > > Phil
> > >
> > > Am 10.11.21 um 20:53 schrieb Gary Gregory:
> > > > I agree with Matt. CSV is just a container, it doesn't know or care
> what
> > > > the concept of a "formula" is.
> > > >
> > > > Gary
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> > For additional commands, e-mail: user-help@commons.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Re: [csv] Does the library provide means to circumvent CSV injection

Posted by sebb <se...@gmail.com>.
On Thu, 11 Nov 2021 at 22:00, Bruno P. Kinoshita
<br...@yahoo.com.br.invalid> wrote:
>
>  Maybe we could provide a flag that disables the characters mentioned in the OWASP page about CSV Injection [1]

No, please no.

> Normally I suggest going secure by default, but in this case I think the flag should be disabled by default, as filtering cells that start with "equals to" could break in cases where it was used for a legit formula for excel/openoffice/libreoffice/etc.

Or maybe the CSV contains data for another application that requires
these meta-characters.
Or maybe there is another application that blows up when accents are used.
Etc.

It is just not the right place to implement such checks.

> The OWASP page also acknowledges the difficulty of this issue I think:
> > This attack is difficult to mitigate, and explicitly disallowed fromquite a few bug bounty programs.
> Just my 0.02 cents.
> Bruno

Yes, it is difficult (if not impossible) to mitigate, especially at
the level of a component like Commons CSV.
We don't know the context of how CSV will be used.

Do we even want to take responsibility for guaranteeing that Commons
CSV prevents all CSV injection attacks?

Even if we were able to release a perfect solution, it would not
prevent injection attacks via direct access to CSV files, which is
surely the main vector.

Seems to me we should just update the Javadoc to mention the issue,
and let the application handle it.

> [1] https://owasp.org/www-community/attacks/CSV_Injection
>     On Friday, 12 November 2021, 04:29:42 am NZDT, Bernd Eckenfels <ec...@zusammenkunft.net> wrote:
>
>  Hello,
>
> I don’t really agree, a generic CSV tool should have a flag to protect against this, since it is a very common requirement. The situation is very unfortunate, this is why there is no good solution by default, but I can asume many software vendors working in the area of windows based enterprise desktops and exporting files with CSV downloads want to enable this.
>
> Having said that, not sure if actually quoting is enough and all should prefer xml based office formats anyway.
>
> I won’t mind to accept a tester patch for such an option. Maybe even unsafe-pass-default/quote-injection/reject-injection enum.
>
> Gruss
> Bernd
>
>
> --
> http://bernd.eckenfels.net
> ________________________________
> Von: sebb <se...@gmail.com>
> Gesendet: Thursday, November 11, 2021 3:42:08 PM
> An: Commons Users List <us...@commons.apache.org>
> Cc: Gary Gregory <ga...@gmail.com>; mseil@acm.org <ms...@acm.org>
> Betreff: Re: [csv] Does the library provide means to circumvent CSV injection
>
> On Thu, 11 Nov 2021 at 11:36, P. Ottlinger <po...@apache.org> wrote:
> >
> > Hi guys,
> >
> > thanks for your reply.
> >
> > Maybe I'm misinterpreting something but I thought that it could be made
> > possible to configure CSVFormat-object when writing the CSV data in a
> > way that any data with possibly corrupting values (as shown on the OWASP
> > page) will mask the whole contents of the cell.
> >
> > Thus a library such as commons-csv would be able to lower the risk for
> > CSV injection and not every client/customer would have to manually
> > create this protecting logic.
> >
> > To my mind it's a simple parser for "dangerous" tokens that quotes the
> > given data with additional &quot; .... as we do not need to write
> > functioning Excel formulas into CSV.
> >
> > WDYT?
>
> As the others have said, this is the wrong place to be looking to fix
> the problem.
>
> CSV files are used for lots of things other than spreadsheets, so what
> is dangerous in one application might be essential in another.
>
> Besides, not all CSV files will be processed by Commons CSV on their
> route to a spreadsheet app.
>
> Such checks need to be made at the input to the application that processes it.
>
> > Cheers,
> > Phil
> >
> > Am 10.11.21 um 20:53 schrieb Gary Gregory:
> > > I agree with Matt. CSV is just a container, it doesn't know or care what
> > > the concept of a "formula" is.
> > >
> > > Gary
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: [csv] Does the library provide means to circumvent CSV injection

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br.INVALID>.
 Maybe we could provide a flag that disables the characters mentioned in the OWASP page about CSV Injection [1]
Normally I suggest going secure by default, but in this case I think the flag should be disabled by default, as filtering cells that start with "equals to" could break in cases where it was used for a legit formula for excel/openoffice/libreoffice/etc.
The OWASP page also acknowledges the difficulty of this issue I think:
> This attack is difficult to mitigate, and explicitly disallowed fromquite a few bug bounty programs.
Just my 0.02 cents.
Bruno

[1] https://owasp.org/www-community/attacks/CSV_Injection
    On Friday, 12 November 2021, 04:29:42 am NZDT, Bernd Eckenfels <ec...@zusammenkunft.net> wrote:  
 
 Hello,

I don’t really agree, a generic CSV tool should have a flag to protect against this, since it is a very common requirement. The situation is very unfortunate, this is why there is no good solution by default, but I can asume many software vendors working in the area of windows based enterprise desktops and exporting files with CSV downloads want to enable this.

Having said that, not sure if actually quoting is enough and all should prefer xml based office formats anyway.

I won’t mind to accept a tester patch for such an option. Maybe even unsafe-pass-default/quote-injection/reject-injection enum.

Gruss
Bernd


--
http://bernd.eckenfels.net
________________________________
Von: sebb <se...@gmail.com>
Gesendet: Thursday, November 11, 2021 3:42:08 PM
An: Commons Users List <us...@commons.apache.org>
Cc: Gary Gregory <ga...@gmail.com>; mseil@acm.org <ms...@acm.org>
Betreff: Re: [csv] Does the library provide means to circumvent CSV injection

On Thu, 11 Nov 2021 at 11:36, P. Ottlinger <po...@apache.org> wrote:
>
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?

As the others have said, this is the wrong place to be looking to fix
the problem.

CSV files are used for lots of things other than spreadsheets, so what
is dangerous in one application might be essential in another.

Besides, not all CSV files will be processed by Commons CSV on their
route to a spreadsheet app.

Such checks need to be made at the input to the application that processes it.

> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
> > I agree with Matt. CSV is just a container, it doesn't know or care what
> > the concept of a "formula" is.
> >
> > Gary
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

  

Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Bernd Eckenfels <ec...@zusammenkunft.net>.
Hello,

I don’t really agree, a generic CSV tool should have a flag to protect against this, since it is a very common requirement. The situation is very unfortunate, this is why there is no good solution by default, but I can asume many software vendors working in the area of windows based enterprise desktops and exporting files with CSV downloads want to enable this.

Having said that, not sure if actually quoting is enough and all should prefer xml based office formats anyway.

I won’t mind to accept a tester patch for such an option. Maybe even unsafe-pass-default/quote-injection/reject-injection enum.

Gruss
Bernd


--
http://bernd.eckenfels.net
________________________________
Von: sebb <se...@gmail.com>
Gesendet: Thursday, November 11, 2021 3:42:08 PM
An: Commons Users List <us...@commons.apache.org>
Cc: Gary Gregory <ga...@gmail.com>; mseil@acm.org <ms...@acm.org>
Betreff: Re: [csv] Does the library provide means to circumvent CSV injection

On Thu, 11 Nov 2021 at 11:36, P. Ottlinger <po...@apache.org> wrote:
>
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?

As the others have said, this is the wrong place to be looking to fix
the problem.

CSV files are used for lots of things other than spreadsheets, so what
is dangerous in one application might be essential in another.

Besides, not all CSV files will be processed by Commons CSV on their
route to a spreadsheet app.

Such checks need to be made at the input to the application that processes it.

> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
> > I agree with Matt. CSV is just a container, it doesn't know or care what
> > the concept of a "formula" is.
> >
> > Gary
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: [csv] Does the library provide means to circumvent CSV injection

Posted by sebb <se...@gmail.com>.
On Thu, 11 Nov 2021 at 11:36, P. Ottlinger <po...@apache.org> wrote:
>
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?

As the others have said, this is the wrong place to be looking to fix
the problem.

CSV files are used for lots of things other than spreadsheets, so what
is dangerous in one application might be essential in another.

Besides, not all CSV files will be processed by Commons CSV on their
route to a spreadsheet app.

Such checks need to be made at the input to the application that processes it.

> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
> > I agree with Matt. CSV is just a container, it doesn't know or care what
> > the concept of a "formula" is.
> >
> > Gary
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: [csv] Does the library provide means to circumvent CSV injection

Posted by "Kevin W. Wall" <ke...@gmail.com>.
Comments from Matt's partner in crime inline, below. -kevin
[This probably will bounce on the Commons Users List since I'm not
subscribed to that mailing list, but if you want it there, any of you who
is subscribed have my permission to forward it.]

On Thu, Nov 11, 2021 at 7:11 PM Matt Seil <ms...@acm.org> wrote:

> The TLDR version:  OWASP's recommendation is specifically to render code
> intended to be executed as unexecutable.  I'd suggest a fix be done at
> OWASP-Java-Encoder project and not here.  I believe the suggestion of
> providing this feature even at OWASP  has near-zero value in the long run
> because the purpose of formulas in Excel IS to be executed--and Microsoft
> already offers the best speed bump.  Here be dragons!
>
> cc'ing my partner in crime.
>
> ============================
>
> I apologize.  This is going to be a TLDR response because I don't know any
> of you professionally so I'm erring on the side of completeness.  Sincere
> apologies if I'm stating things you believe to be obvious, or am myself
> ignorant of something obvious.
>
> So I think there's a misunderstanding in regards to the threat described
> by the OWASP article.  The threat is explicitly *FORMULA *execution in
> Excel--and LibreOffice.  It sounds similar to a browser problem but its
> not, its far worse. The reason why this particular threat tends to be out
> of bounds in bug bounty programs and in CTF contests is that the attack
> that exploits this is a social engineering attack which always works in the
> real world.  Hence why bug bounties won't pay out for it.
>
> The recommendation from OWASP is as follows:
>
> Encode the offending characters to:
>
>    - Equals to (=)
>    - Plus (+)
>    - Minus (-)
>    - At (@)
>    - Tab (0x09)
>    - Carriage return (0x0D)
>    - The set [;',"] be similarly escaped
>
> While this would be a mitigation, it would also *purposefully break** any
> formulas* placed into a csv cell.  This is a critical point, and I'll
> come back to it later.   It's all or nothing.
>
> This is where Phil's comment comes in:
>
> "Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell."
>
> First, let me stress again the risk:  The threat isn't masking cell
> contents, its *execution *of normal logic in a malicious way.  This is
> the €1M question:  "How do we differentiate corrupting values from valid
> values?"
>

I think it is even more specific than that even. The problem is the
application used to read the CSV file and render it as a table /
spreadsheet. While I've see CSV files with something like:

=cmd|’ /C calc’!A0

pop calc.exe in demos on Windows with Excel, I'm not sure the same thing
(with an appropriately renamed calculator application) would launch in
(say) MacOS with their 'Numbers.app' or in Linux with LibreOffice Calc. To
me, the main level of responsibility of preventing this should be with
whatever application handles the CSV file, not the application that accepts
it to be uploaded. To put it a different way, if you allowed general
untrusted users to upload .html files to some web application (which we all
agree would be dumb) I don't think we'd expect that the web application
that allows that should be responsible for trying to parse the HTML file to
make sure that all the links were safe and that there was no malicious
JavaScript contained in it. [Note: BTW, we *do* have a very similar problem
with general PDFs being uploaded, but there most of those PDF readers
(looking at you, Adobe) have finally learned it's bad for sales if you get
a shitload of bad PR because Acrobat or Acrobat reader automatically
enables the interpretation of JavaScript for the convenience of its users.]
Or if they can upload image files, do we have to write parsers that
correctly verify malformed GIF, TIFF, JPEG, or PNG images just because one
popular library might have a buffer overflow in it or a use after free
issue? (Nah. That would never happen.)

Clearly, if you are going insist on allowing potentially malicious
artifacts to be uploaded, you want to deploy a broader no code / low code
solution, so you drop the file into the file system (before you potentially
store in in you DB) and rely on malware detection scanning to spot and
quarantine the ones with the evil bits.

>
> Asking this csv library to do it means it has to take on quite a bit of
> intelligence.  It doesn't just have to understand what a CSV format is
> anymore.  It has to answer questions like "*What's a corrupt equal sign
> look like?*"  And it looks like a valid equal sign.  So to do this right,
> you have to do lexical analysis and parsing the same way that Excel is
> going to do it, and THEN you have to infer behavior.
>
I should not be able to *create* evil bits using whatever CSV library
(Apache Commons CSV or any other) that you are referring to here, but I
agree that it should not have to filter out the evil bits itself. What I
would do is allow the user of said library to register their own method /
function that the library's parser uses as a call back to validate the
input of each individual parsed field. That validation function could be as
simple as just returning a boolean or it could be more complicated and
return an array of the fields (by position) that failed the validation. But
it then pushes the responsibility into the own developers hands who should
have some dreaming idea of what they want to do with it and whether on not
the expect things like popping up a calculator to be allowed or not.

>
> Therefore to determine what corrupt characters look like given data
> designed to be executed you are now in the business of trying to interpret
> what the excel formula is doing, in order to determine whether or not its
> safe.
>
Yeah, you cannot automatically make that determination because you don't
have additional context. But if you provide a mechanism for a callback
function, you put that responsibility back in the hands of someone who
might be able to decide that. (This would also allow them to write a FCC
spreadsheet filter for George Carlin's 7 Dirty Words! :)

> This is the core problem:  formulas are bits of *user-supplied* *code **designed
> to be executed*.  If you escape it, you break it.  At best, you annoy the
> hell out of the accountant who was expecting your web app to offer a usable
> spreadsheet, while adding one layer of manual intervention other than the
> standard warning that MS Office provides whenever you open an Excel not
> created on your machine.
>
> So... what can we do about it?  Microsoft already did it:
>
> IMHO there's nothing that any intermediary library can do that's any
> better than this.    Web applications designed to take spreadsheets as
> input are special beasts.  The proper security rule of thumb is to always
> ensure DATA is treated as DATA.  But that rule gets *really funky* when
> that DATA is actually supposed to be executable code.  But that's your
> choice:  if you don't want it to execute you have to force it to be data,
> which will break execution by programmer intent.
>
> However, I suspect a few of you will be unhappy with my "do nothing"
> suggestion and insist that something ought to be done.
>
> I would recommend writing a CSV encoder for the owasp-java-encoder
> project.  https://github.com/OWASP/owasp-java-encoder The framework is
> already in place and its where I push people if they only need encoding
> functions.
>
I seem to recall someone recently asking Jim Manico about that in a GitHub
issue or discussion. That memory is fuzzy, but IIRC, I think he resisted
the idea. If someone actually implemented it and appropriate tests and made
a PR to support it though, they might make it happen. (But keep in mind
that if someone is going to try that, it has to work with Java 5!!!)


> Why I wouldn't do it here:  libraries like this have to be written to the
> lowest-common-denominator, meaning csv format projects that don't have
> Excel as a target.
>
Ding, ding, ding!  Give that man an HTTP cookie. Excel isn't the only
player on the block. They weren't even the first. (They're just the biggest
and maybe most obnoxious, but I digress.)

> You want security functions to process as close to the business logic as
> possible, and this is the wrong target for that.  Doing it here means not
> breaking legacy code, which means by default, the option will be off.  (Or
> you follow a deprecation strategy.)
>
And it's worse than it is for web browsers, which are (supposed to be)
losely standards based, based on W3C and IETF standards. It's also worse in
other ways, because people key spreadsheets for 20+ years and still expect
them to work. (Guilty!)  AFAIK, they're ain't no spreadsheet
interoperability standards committee, so it will likely only change after a
massive public breach.

> Further--this gets to my original hint about threat models--executing
> formulas in cells is a *desired function* of Excel and its copies.  When
> developers start breaking spreadsheets they're going to revert to legacy
> behavior meaning you're really talking about improving the defensive
> capability for the security-minded developers that can stand up to the
> finance department.  When OWASP tells you "This attack is difficult to
> mitigate," it isn't just the technical issues involved--which I just
> outlined--its social.  This is why I'm hesitant to offer up "We'll do it in
> ESAPI," because I don't see the value-add in the bigger picture.  Plus, *this
> is Microsoft's fault* and I'm not thrilled with writing code to speedbump
> *their* problem.  Which, I feel they've addressed as well as they ever
> will.
>

Agree. We will NOT do it in ESAPI. Not the output encoding, at least; it's
just totally wrong for that. If someone wants to write a PR to add it to
the (IMO) already bloated Validator, we *might* be able to accommodate it,
but it just depends on how complicated the code is and how soon it's needed
and how good the test suite for it is.
-kevin

>
>
> On 11/11/2021 4:36 AM, P. Ottlinger wrote:
>
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?
>
> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
>
> I agree with Matt. CSV is just a container, it doesn't know or care what
> the concept of a "formula" is.
>
> Gary
>
>
-kevin
-- 
Blog: https://off-the-wall-security.blogspot.com/    | Twitter: @KevinWWall
| OWASP ESAPI Project co-lead
NSA: All your crypto bit are belong to us.

Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Bernd Eckenfels <ec...@zusammenkunft.net>.
Btw Microsoft’s macro execution prevention does not protect against formulas in CSV. It are different pop-ups (at least in office 2016) which will be shown in the formula injection case, especially when using the cmd| mechanism, but it’s rather easy to allow it (if administrators don’t prevent it with a group policy). It is a bit unfortunate as it tells you, you need to trust the souce, who would distrust their ERP or their bank… Not to mention that Google sheets is also affected. For that reason, a credible SaaS does have to filter those.

Gruss
Bernd
--
http://bernd.eckenfels.net
________________________________
Von: Matt Seil <xe...@gmail.com> im Auftrag von Matt Seil <ms...@acm.org>
Gesendet: Friday, November 12, 2021 1:11:19 AM
An: Commons Users List <us...@commons.apache.org>; P. Ottlinger <po...@apache.org>; Gary Gregory <ga...@gmail.com>; Kevin W. Wall <ke...@gmail.com>
Betreff: Re: [csv] Does the library provide means to circumvent CSV injection


The TLDR version:  OWASP's recommendation is specifically to render code intended to be executed as unexecutable.  I'd suggest a fix be done at OWASP-Java-Encoder project and not here.  I believe the suggestion of providing this feature even at OWASP  has near-zero value in the long run because the purpose of formulas in Excel IS to be executed--and Microsoft already offers the best speed bump.  Here be dragons!

cc'ing my partner in crime.

============================

I apologize.  This is going to be a TLDR response because I don't know any of you professionally so I'm erring on the side of completeness.  Sincere apologies if I'm stating things you believe to be obvious, or am myself ignorant of something obvious.

So I think there's a misunderstanding in regards to the threat described by the OWASP article.  The threat is explicitly FORMULA execution in Excel--and LibreOffice.  It sounds similar to a browser problem but its not, its far worse. The reason why this particular threat tends to be out of bounds in bug bounty programs and in CTF contests is that the attack that exploits this is a social engineering attack which always works in the real world.  Hence why bug bounties won't pay out for it.

The recommendation from OWASP is as follows:

Encode the offending characters to:

  *   Equals to (=)
  *   Plus (+)
  *   Minus (-)
  *   At (@)
  *   Tab (0x09)
  *   Carriage return (0x0D)
  *   The set [;',"] be similarly escaped

While this would be a mitigation, it would also purposefully break any formulas placed into a csv cell.  This is a critical point, and I'll come back to it later.   It's all or nothing.

This is where Phil's comment comes in:

"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."

First, let me stress again the risk:  The threat isn't masking cell contents, its execution of normal logic in a malicious way.  This is the €1M question:  "How do we differentiate corrupting values from valid values?"

Asking this csv library to do it means it has to take on quite a bit of intelligence.  It doesn't just have to understand what a CSV format is anymore.  It has to answer questions like "What's a corrupt equal sign look like?"  And it looks like a valid equal sign.  So to do this right, you have to do lexical analysis and parsing the same way that Excel is going to do it, and THEN you have to infer behavior.

Therefore to determine what corrupt characters look like given data designed to be executed you are now in the business of trying to interpret what the excel formula is doing, in order to determine whether or not its safe.  This is the core problem:  formulas are bits of user-supplied code designed to be executed.  If you escape it, you break it.  At best, you annoy the hell out of the accountant who was expecting your web app to offer a usable spreadsheet, while adding one layer of manual intervention other than the standard warning that MS Office provides whenever you open an Excel not created on your machine.

So... what can we do about it?  Microsoft already did it:

[cid:part1.zKavEz9C.SwuG5A47@acm.org]

IMHO there's nothing that any intermediary library can do that's any better than this.    Web applications designed to take spreadsheets as input are special beasts.  The proper security rule of thumb is to always ensure DATA is treated as DATA.  But that rule gets really funky when that DATA is actually supposed to be executable code.  But that's your choice:  if you don't want it to execute you have to force it to be data, which will break execution by programmer intent.

However, I suspect a few of you will be unhappy with my "do nothing" suggestion and insist that something ought to be done.

I would recommend writing a CSV encoder for the owasp-java-encoder project.  https://github.com/OWASP/owasp-java-encoder The framework is already in place and its where I push people if they only need encoding functions.

Why I wouldn't do it here:  libraries like this have to be written to the lowest-common-denominator, meaning csv format projects that don't have Excel as a target.  You want security functions to process as close to the business logic as possible, and this is the wrong target for that.  Doing it here means not breaking legacy code, which means by default, the option will be off.  (Or you follow a deprecation strategy.)  Further--this gets to my original hint about threat models--executing formulas in cells is a desired function of Excel and its copies.  When developers start breaking spreadsheets they're going to revert to legacy behavior meaning you're really talking about improving the defensive capability for the security-minded developers that can stand up to the finance department.  When OWASP tells you "This attack is difficult to mitigate," it isn't just the technical issues involved--which I just outlined--its social.  This is why I'm hesitant to offer up "We'll do it in ESAPI," because I don't see the value-add in the bigger picture.  Plus, this is Microsoft's fault and I'm not thrilled with writing code to speedbump *their* problem.  Which, I feel they've addressed as well as they ever will.



On 11/11/2021 4:36 AM, P. Ottlinger wrote:

Hi guys,

thanks for your reply.

Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.

Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.

To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional &quot; .... as we do not need to write
functioning Excel formulas into CSV.

WDYT?

Cheers,
Phil

Am 10.11.21 um 20:53 schrieb Gary Gregory:


I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.

Gary





Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Bernd Eckenfels <ec...@zusammenkunft.net>.
I disagree, if you do a data export with text fields you are normally sure there are no formulas in there and you want to ensure custom text is escapes in a way that it does not break on excel import in dangerous way. This is a very legit usecase for an exporter library, especially one which already has different output formats and column encodings.

Gruss
Bernd


--
http://bernd.eckenfels.net
________________________________
Von: Matt Seil <xe...@gmail.com> im Auftrag von Matt Seil <ms...@acm.org>
Gesendet: Friday, November 12, 2021 1:11:19 AM
An: Commons Users List <us...@commons.apache.org>; P. Ottlinger <po...@apache.org>; Gary Gregory <ga...@gmail.com>; Kevin W. Wall <ke...@gmail.com>
Betreff: Re: [csv] Does the library provide means to circumvent CSV injection


The TLDR version:  OWASP's recommendation is specifically to render code intended to be executed as unexecutable.  I'd suggest a fix be done at OWASP-Java-Encoder project and not here.  I believe the suggestion of providing this feature even at OWASP  has near-zero value in the long run because the purpose of formulas in Excel IS to be executed--and Microsoft already offers the best speed bump.  Here be dragons!

cc'ing my partner in crime.

============================

I apologize.  This is going to be a TLDR response because I don't know any of you professionally so I'm erring on the side of completeness.  Sincere apologies if I'm stating things you believe to be obvious, or am myself ignorant of something obvious.

So I think there's a misunderstanding in regards to the threat described by the OWASP article.  The threat is explicitly FORMULA execution in Excel--and LibreOffice.  It sounds similar to a browser problem but its not, its far worse. The reason why this particular threat tends to be out of bounds in bug bounty programs and in CTF contests is that the attack that exploits this is a social engineering attack which always works in the real world.  Hence why bug bounties won't pay out for it.

The recommendation from OWASP is as follows:

Encode the offending characters to:

  *   Equals to (=)
  *   Plus (+)
  *   Minus (-)
  *   At (@)
  *   Tab (0x09)
  *   Carriage return (0x0D)
  *   The set [;',"] be similarly escaped

While this would be a mitigation, it would also purposefully break any formulas placed into a csv cell.  This is a critical point, and I'll come back to it later.   It's all or nothing.

This is where Phil's comment comes in:

"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."

First, let me stress again the risk:  The threat isn't masking cell contents, its execution of normal logic in a malicious way.  This is the €1M question:  "How do we differentiate corrupting values from valid values?"

Asking this csv library to do it means it has to take on quite a bit of intelligence.  It doesn't just have to understand what a CSV format is anymore.  It has to answer questions like "What's a corrupt equal sign look like?"  And it looks like a valid equal sign.  So to do this right, you have to do lexical analysis and parsing the same way that Excel is going to do it, and THEN you have to infer behavior.

Therefore to determine what corrupt characters look like given data designed to be executed you are now in the business of trying to interpret what the excel formula is doing, in order to determine whether or not its safe.  This is the core problem:  formulas are bits of user-supplied code designed to be executed.  If you escape it, you break it.  At best, you annoy the hell out of the accountant who was expecting your web app to offer a usable spreadsheet, while adding one layer of manual intervention other than the standard warning that MS Office provides whenever you open an Excel not created on your machine.

So... what can we do about it?  Microsoft already did it:

[cid:part1.zKavEz9C.SwuG5A47@acm.org]

IMHO there's nothing that any intermediary library can do that's any better than this.    Web applications designed to take spreadsheets as input are special beasts.  The proper security rule of thumb is to always ensure DATA is treated as DATA.  But that rule gets really funky when that DATA is actually supposed to be executable code.  But that's your choice:  if you don't want it to execute you have to force it to be data, which will break execution by programmer intent.

However, I suspect a few of you will be unhappy with my "do nothing" suggestion and insist that something ought to be done.

I would recommend writing a CSV encoder for the owasp-java-encoder project.  https://github.com/OWASP/owasp-java-encoder The framework is already in place and its where I push people if they only need encoding functions.

Why I wouldn't do it here:  libraries like this have to be written to the lowest-common-denominator, meaning csv format projects that don't have Excel as a target.  You want security functions to process as close to the business logic as possible, and this is the wrong target for that.  Doing it here means not breaking legacy code, which means by default, the option will be off.  (Or you follow a deprecation strategy.)  Further--this gets to my original hint about threat models--executing formulas in cells is a desired function of Excel and its copies.  When developers start breaking spreadsheets they're going to revert to legacy behavior meaning you're really talking about improving the defensive capability for the security-minded developers that can stand up to the finance department.  When OWASP tells you "This attack is difficult to mitigate," it isn't just the technical issues involved--which I just outlined--its social.  This is why I'm hesitant to offer up "We'll do it in ESAPI," because I don't see the value-add in the bigger picture.  Plus, this is Microsoft's fault and I'm not thrilled with writing code to speedbump *their* problem.  Which, I feel they've addressed as well as they ever will.



On 11/11/2021 4:36 AM, P. Ottlinger wrote:

Hi guys,

thanks for your reply.

Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.

Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.

To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional &quot; .... as we do not need to write
functioning Excel formulas into CSV.

WDYT?

Cheers,
Phil

Am 10.11.21 um 20:53 schrieb Gary Gregory:


I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.

Gary





Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Matt Seil <ms...@acm.org>.
The TLDR version:  OWASP's recommendation is specifically to render code 
intended to be executed as unexecutable.  I'd suggest a fix be done at 
OWASP-Java-Encoder project and not here.  I believe the suggestion of 
providing this feature even at OWASP has near-zero value in the long run 
because the purpose of formulas in Excel IS to be executed--and 
Microsoft already offers the best speed bump.  Here be dragons!

cc'ing my partner in crime.

============================

I apologize.  This is going to be a TLDR response because I don't know 
any of you professionally so I'm erring on the side of completeness.  
Sincere apologies if I'm stating things you believe to be obvious, or am 
myself ignorant of something obvious.

So I think there's a misunderstanding in regards to the threat described 
by the OWASP article.  The threat is explicitly *FORMULA *execution in 
Excel--and LibreOffice.  It sounds similar to a browser problem but its 
not, its far worse. The reason why this particular threat tends to be 
out of bounds in bug bounty programs and in CTF contests is that the 
attack that exploits this is a social engineering attack which always 
works in the real world. Hence why bug bounties won't pay out for it.

The recommendation from OWASP is as follows:

Encode the offending characters to:

  * Equals to (|=|)
  * Plus (|+|)
  * Minus (|-|)
  * At (|@|)
  * Tab (|0x09|)
  * Carriage return (|0x0D|)
  * The set [;',"] be similarly escaped

While this would be a mitigation, it would also /_*purposefully 
break*_//_any formulas_/ placed into a csv cell.  This is a critical 
point, and I'll come back to it later.   It's all or nothing.

This is where Phil's comment comes in:

"Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell."

First, let me stress again the risk:  The threat isn't masking cell 
contents, its *execution *of normal logic in a malicious way.  This is 
the €1M question:  "How do we differentiate corrupting values from valid 
values?"

Asking this csv library to do it means it has to take on quite a bit of 
intelligence.  It doesn't just have to understand what a CSV format is 
anymore.  It has to answer questions like "/*What's a corrupt equal sign 
look like?*/"  And it looks like a valid equal sign.  So to do this 
right, you have to do lexical analysis and parsing the same way that 
Excel is going to do it, and THEN you have to infer behavior.

Therefore to determine what corrupt characters look like given data 
designed to be executed you are now in the business of trying to 
interpret what the excel formula is doing, in order to determine whether 
or not its safe.  This is the core problem: formulas are bits of 
/user-supplied/ /code /*designed to be executed*.  If you escape it, you 
break it.  At best, you annoy the hell out of the accountant who was 
expecting your web app to offer a usable spreadsheet, while adding one 
layer of manual intervention other than the standard warning that MS 
Office provides whenever you open an Excel not created on your machine.

So... what can we do about it?  Microsoft already did it:

IMHO there's nothing that any intermediary library can do that's any 
better than this.    Web applications designed to take spreadsheets as 
input are special beasts.  The proper security rule of thumb is to 
always ensure DATA is treated as DATA.  But that rule gets *really 
funky* when that DATA is actually supposed to be executable code.  But 
that's your choice:  if you don't want it to execute you have to force 
it to be data, which will break execution by programmer intent.

However, I suspect a few of you will be unhappy with my "do nothing" 
suggestion and insist that something ought to be done.

I would recommend writing a CSV encoder for the owasp-java-encoder 
project. https://github.com/OWASP/owasp-java-encoder The framework is 
already in place and its where I push people if they only need encoding 
functions.

Why I wouldn't do it here:  libraries like this have to be written to 
the lowest-common-denominator, meaning csv format projects that don't 
have Excel as a target.  You want security functions to process as close 
to the business logic as possible, and this is the wrong target for 
that.  Doing it here means not breaking legacy code, which means by 
default, the option will be off.  (Or you follow a deprecation 
strategy.)  Further--this gets to my original hint about threat 
models--executing formulas in cells is a /desired function/ of Excel and 
its copies.  When developers start breaking spreadsheets they're going 
to revert to legacy behavior meaning you're really talking about 
improving the defensive capability for the security-minded developers 
that can stand up to the finance department.  When OWASP tells you "This 
attack is difficult to mitigate," it isn't just the technical issues 
involved--which I just outlined--its social.  This is why I'm hesitant 
to offer up "We'll do it in ESAPI," because I don't see the value-add in 
the bigger picture.  Plus, _*/this is Microsoft's fault/*_ and I'm not 
thrilled with writing code to speedbump *their* problem.  Which, I feel 
they've addressed as well as they ever will.



On 11/11/2021 4:36 AM, P. Ottlinger wrote:
> Hi guys,
>
> thanks for your reply.
>
> Maybe I'm misinterpreting something but I thought that it could be made
> possible to configure CSVFormat-object when writing the CSV data in a
> way that any data with possibly corrupting values (as shown on the OWASP
> page) will mask the whole contents of the cell.
>
> Thus a library such as commons-csv would be able to lower the risk for
> CSV injection and not every client/customer would have to manually
> create this protecting logic.
>
> To my mind it's a simple parser for "dangerous" tokens that quotes the
> given data with additional &quot; .... as we do not need to write
> functioning Excel formulas into CSV.
>
> WDYT?
>
> Cheers,
> Phil
>
> Am 10.11.21 um 20:53 schrieb Gary Gregory:
>> I agree with Matt. CSV is just a container, it doesn't know or care what
>> the concept of a "formula" is.
>>
>> Gary

Re: [csv] Does the library provide means to circumvent CSV injection

Posted by "P. Ottlinger" <po...@apache.org>.
Hi guys,

thanks for your reply.

Maybe I'm misinterpreting something but I thought that it could be made
possible to configure CSVFormat-object when writing the CSV data in a
way that any data with possibly corrupting values (as shown on the OWASP
page) will mask the whole contents of the cell.

Thus a library such as commons-csv would be able to lower the risk for
CSV injection and not every client/customer would have to manually
create this protecting logic.

To my mind it's a simple parser for "dangerous" tokens that quotes the
given data with additional &quot; .... as we do not need to write
functioning Excel formulas into CSV.

WDYT?

Cheers,
Phil

Am 10.11.21 um 20:53 schrieb Gary Gregory:
> I agree with Matt. CSV is just a container, it doesn't know or care what
> the concept of a "formula" is.
> 
> Gary


Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Gary Gregory <ga...@gmail.com>.
I agree with Matt. CSV is just a container, it doesn't know or care what
the concept of a "formula" is.

Gary


On Wed, Nov 10, 2021, 14:49 Matt Seil <ms...@acm.org> wrote:

> Hello,
>
> I'm Matt Seil, project co-lead for OWASP's ESAPI-Java-Legacy project.
>
> This email caught my attention.  In short, I don't think you're going to
> get an affirmative answer because the potential use cases are too
> numerous.  I'm totally speaking out of turn here however, there may be
> something the maintainers are doing that I'm not aware of.
>
> As long as it's an acceptable practice to place formulas in CSV
> documents, the risk will exist.  The parsing library is the wrong part
> of the data chain to focus on a fix for an issue like this. The decision
> as to whether or not a formula is valid in a cell is business, industry,
> and application-specific, so this should be handled in the application.
>
> In this case, it would be in Excel itself where you'd want to focus your
> attention, as that's where the formula gets executed. If you can't
> control that, you move up the chain until you reach the part of the data
> flow you DO have full control over, and you work there.  IMHO, you've
> moved too far up the chain.
>
> On 11/10/2021 11:34 AM, P. Ottlinger wrote:
> > Hi,
> >
> > I just stumbled upon
> > https://owasp.org/www-community/attacks/CSV_Injection#
> > and asked myself if CommonsCSV provides a means to circumvent these kind
> > of attacks.
> >
> > If the library handles these special characters and prevents attacks
> > from working it should be mentioned on the homepage.
> >
> > If it doesn't handle I'd like to know how customers/users prevent these
> > kind of attacks. Maybe there's a working solution that can easily be
> > integrated into CommonsCSV?
> >
> > Thanks,
> > Phil
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

Re: [csv] Does the library provide means to circumvent CSV injection

Posted by Matt Seil <ms...@acm.org>.
Hello,

I'm Matt Seil, project co-lead for OWASP's ESAPI-Java-Legacy project.

This email caught my attention.  In short, I don't think you're going to 
get an affirmative answer because the potential use cases are too  
numerous.  I'm totally speaking out of turn here however, there may be 
something the maintainers are doing that I'm not aware of.

As long as it's an acceptable practice to place formulas in CSV 
documents, the risk will exist.  The parsing library is the wrong part 
of the data chain to focus on a fix for an issue like this. The decision 
as to whether or not a formula is valid in a cell is business, industry, 
and application-specific, so this should be handled in the application.

In this case, it would be in Excel itself where you'd want to focus your 
attention, as that's where the formula gets executed. If you can't 
control that, you move up the chain until you reach the part of the data 
flow you DO have full control over, and you work there.  IMHO, you've 
moved too far up the chain.

On 11/10/2021 11:34 AM, P. Ottlinger wrote:
> Hi,
>
> I just stumbled upon
> https://owasp.org/www-community/attacks/CSV_Injection#
> and asked myself if CommonsCSV provides a means to circumvent these kind
> of attacks.
>
> If the library handles these special characters and prevents attacks
> from working it should be mentioned on the homepage.
>
> If it doesn't handle I'd like to know how customers/users prevent these
> kind of attacks. Maybe there's a working solution that can easily be
> integrated into CommonsCSV?
>
> Thanks,
> Phil
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org