You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2016/04/02 17:34:14 UTC

[Issue 126901] New: CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

https://bz.apache.org/ooo/show_bug.cgi?id=126901

          Issue ID: 126901
        Issue Type: DEFECT
           Summary: CSV import: values with + or - followed by thousand
                    separator and 3 digits (eg. +,123) are imported as a
                    number
           Product: Calc
           Version: 4.2.0-dev
          Hardware: All
                OS: All
            Status: CONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: open-import
          Assignee: issues@openoffice.apache.org
          Reporter: damjan@apache.org

Make a CSV file containing:
+,123

Open it in Calc.
Select "Separated by", set "Space" or anything other than "Comma" as the
separator, and deselect "Detect special numbers".
The value "123" is imported.

It should be imported as the string "+,123" instead, as numbers cannot begin
with thousand separators.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

Kay <ks...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kschenk@apache.org

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

--- Comment #3 from damjan@apache.org ---
Created attachment 85384
  --> https://bz.apache.org/ooo/attachment.cgi?id=85384&action=edit
Don't allow thousand separators before digits

Proposed patch. Changes the allowed placement of the thousand separator in
ScStringUtil::parseSimpleNumber() to be after digits, instead of not first in
the string.

The patch is quite long because I added a unit test for it, which needed
changes to main/sc to export ScStringUtil from main/sc/inc/stringutil.hxx so
the unit test could link to it (I've emailed dev@ in a mail entitled "Exporting
symbols from AOO shared libraries" asking whether this is ok).

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

--- Comment #2 from damjan@apache.org ---
(In reply to orcmid from comment #1)
> (In reply to damjan from comment #0)
> > Make a CSV file containing:
> > +,123
> [ ... ]
> > It should be imported as the string "+,123" instead, as numbers cannot begin
> > with thousand separators.
> 
> What happens if you use quotes around the string (that is, literally
> "+,123") the normal means for determining that a cell value is meant to be
> taken as text and not anything else?
> 
> The import scheme for text (including CSV) is rather generous and attempts
> to handle more cases than standard CSV, which is why there are all those
> parameters.  It also tends to find numbers in any surround and use that
> number. So it accepts cases that Calc never produces in CSV output when text
> fields are quoted.  
> 
> Because the input filter has so many variations, it is necessary to delimit
> text fields so they import properly always when it is known that
> conventional Excel CSV is intended.  The input filter should then be set
> accordingly.
> 
> I assume that having the text filter work in the way it does was considered
> a feature, and the few kinds of CSVs that Calc produces were simply lumped
> into cases of this more "general" feature.
> 
> EXPERIMENTS
> 
> Importing that file into Excel 2016 with a US locale creates two fields, one
> with the text "+" and another with the numeral 123.  I don't know what
> happens in locales where the comma is a decimal separator.  Same as +.123?
> 
> Using ",123" (with the quotes) instead works fine directly in Excel and will
> work fine in OpenOffice Calc once the Option "Quoted field as text" is
> checked for the text import.

I did an even better test in Excel 2007, which doesn't offer any options when
importing CSV. What I did is type '+,123 into a cell (the ' is necessary to get
Excel to stop complaining about a wrong formula), and then used "Text to
columns" on that cell with space as the separator. It did not split the cell,
and it remained a string. In Calc it would be converted to a number.

> I am confirming the behavior.  Whether it is a defect or by design is a
> bigger question.

I discovered this bug by reading the code, and then coming up with this test
case that will reproduce it, so I suspect it's a defect. Method
ScStringUtil::parseSimpleNumber() in main/source/core/tool/stringutil.cxx, the
"else if (c == gsep)" case in the "for" loop attempts to fail parsing the
number if the thousand separator appears in wrong places, such as after the
decimal separator, or not 3 digits apart. One of the tests is whether the
thousand separator is "// not allowed as the first character", and performed by
checking whether "i == 0", which is not strict enough, because a '+' or a '-'
can be the first character, and the thousand separator will still be wrongly
placed if it appears after it and before any digits. The test should be "not
allowed before digits" instead.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

--- Comment #4 from damjan@apache.org ---
Oh and selecting "Detect special numbers" in the CSV import screen correctly
imports values like +,123 as strings, since it bypasses
ScStringUtil::parseSimpleNumber() and uses the much more advanced parser in the
svl module. This also what leads me to believe that this is really a defect in
ScStringUtil::parseSimpleNumber().

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

oooforum (fr) <oo...@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |oooforum@free.fr
         Issue Type|DEFECT                      |PATCH

--- Comment #5 from oooforum (fr) <oo...@free.fr> ---
Set status as PATCH :-)

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Issue Type|PATCH                       |DEFECT
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.2.0
             Status|CONFIRMED                   |RESOLVED
             Latest|---                         |4.2.0-dev
    Confirmation in|                            |

--- Comment #7 from damjan@apache.org ---
I am happy with my fix and test, so patch committed, resolving fixed. Thank you
for your input!

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

--- Comment #6 from SVN Robot <sv...@dev.null.org> ---
"damjan" committed SVN revision 1737624 into trunk:
#i126901# CSV import: values with + or - followed by thousand separator and

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126901] CSV import: values with + or - followed by thousand separator and 3 digits (eg. +,123) are imported as a number

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126901

orcmid <or...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |orcmid@apache.org

--- Comment #1 from orcmid <or...@apache.org> ---
(In reply to damjan from comment #0)
> Make a CSV file containing:
> +,123
[ ... ]
> It should be imported as the string "+,123" instead, as numbers cannot begin
> with thousand separators.

What happens if you use quotes around the string (that is, literally "+,123")
the normal means for determining that a cell value is meant to be taken as text
and not anything else?

The import scheme for text (including CSV) is rather generous and attempts to
handle more cases than standard CSV, which is why there are all those
parameters.  It also tends to find numbers in any surround and use that number.
So it accepts cases that Calc never produces in CSV output when text fields are
quoted.  

Because the input filter has so many variations, it is necessary to delimit
text fields so they import properly always when it is known that conventional
Excel CSV is intended.  The input filter should then be set accordingly.

I assume that having the text filter work in the way it does was considered a
feature, and the few kinds of CSVs that Calc produces were simply lumped into
cases of this more "general" feature.

EXPERIMENTS

Importing that file into Excel 2016 with a US locale creates two fields, one
with the text "+" and another with the numeral 123.  I don't know what happens
in locales where the comma is a decimal separator.  Same as +.123?

Using ",123" (with the quotes) instead works fine directly in Excel and will
work fine in OpenOffice Calc once the Option "Quoted field as text" is checked
for the text import.

I am confirming the behavior.  Whether it is a defect or by design is a bigger
question.

-- 
You are receiving this mail because:
You are the assignee for the issue.