You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2016/01/22 23:52:39 UTC

[Issue 126805] New: Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

https://bz.apache.org/ooo/show_bug.cgi?id=126805

          Issue ID: 126805
        Issue Type: DEFECT
           Summary: Importing CSV where apostrophe is separator and
                    contained in text causes remaining rows to not import
           Product: Calc
           Version: 4.1.2
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: open-import
          Assignee: issues@openoffice.apache.org
          Reporter: thall91739@gmail.com

Created attachment 85264
  --> https://bz.apache.org/ooo/attachment.cgi?id=85264&action=edit
CSV file that can not be completely imported.

When importing a csv file with an apostrophe as text and numeric delimiters and
one of the text fields has an apostrophe in it, OpenOffice will stop importing
the remainder of the file when the Separator Options are made to eliminate the
apostrophes around the text and numeric values. Attached is a CSV file that can
not be imported past row 124. This file imports correctly into Microsoft Excel.

When the apostrophes are not showing in the preview window on the bottom of the
Text Import window, scroll down to row 124 then scroll to the right. You will
see the remainder of the rows are continuing across on row 124.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rob@psdata.cc

--- Comment #5 from damjan@apache.org ---
*** Issue 123831 has been marked as a duplicate of this issue. ***

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |damjan@apache.org

--- Comment #3 from damjan@apache.org ---
Excel 2007 imports the fields surrounded with '. Apache Commons CSV can also
parse all the lines in this format. AOO parses this too as long as the "Text
delimiter" is not ', although it strips the leading ' which seems like a
separate bug.

What is very impressive is Excel's behaviour when ' is replaced with ". It
correctly imports all lines, with the extraneous " inside the text removed. I
have no idea how it does this - it must use advanced column and field
auto-detection heuristics or something. Neither AOO nor Apache Commons CSV are
able to get past line 124.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

mroe <mr...@gmx.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |CONFIRMED
     Ever confirmed|0                           |1

--- Comment #1 from mroe <mr...@gmx.net> ---
Can confirm the issue with the attached sample file. (row 124)

Same problem if the CSV is used as Base datasource.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

--- Comment #7 from damjan@apache.org ---
(In reply to Kay from comment #6)
> Results of my test with Linux-32 r. 1739631
> 
> The file imports completely. However line 124, second field from the
> original file as:
> 
> 'ARLINGTON INT'L AVIATION'
> 
> gets imported as:
> 
> ARLINGTON INTL AVIATION'
> 
> when I use the following as CSV import specifications:
> 
> * text separator -- comma
> * text deliminitor -- ' (single quote)
> 
> So the internal quote in INT'L gets dropped.

Which is exactly what Excel does in its "Text to columns" with that line, and
which is how it generally works: fields starting with the quote character
continue until the first quote character without an adjacent quote character,
and then any subsequent text is appended as it is, without considering quoting
at all.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

Kay <ks...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kschenk@apache.org

--- Comment #6 from Kay <ks...@apache.org> ---
Results of my test with Linux-32 r. 1739631

The file imports completely. However line 124, second field from the original
file as:

'ARLINGTON INT'L AVIATION'

gets imported as:

ARLINGTON INTL AVIATION'

when I use the following as CSV import specifications:

* text separator -- comma
* text deliminitor -- ' (single quote)

So the internal quote in INT'L gets dropped.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.2.0
         Resolution|---                         |FIXED
             Status|CONFIRMED                   |RESOLVED
             Latest|---                         |4.2.0-dev
    Confirmation in|                            |

--- Comment #4 from damjan@apache.org ---
I didn't intend to fix this bug with the patch in r1739628, but I did. The
problem was that the CSV line parser was trying to read and concatenate lines
until it gets an even number of quote characters, which never happened. It now
parses lines using the same logic as the CSV field parsers. The behaviour also
matches Excel's.

Resolving fixed. Thank you for your bug report!

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

bmarcelly <ma...@club-internet.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marcelly@club-internet.fr

--- Comment #2 from bmarcelly <ma...@club-internet.fr> ---
If apostrophe is used as text delimiter, a single apostrophe must not appear
within a text field. This is a matter of logic.

Within a text any apostrophe must be doubled.

Line 124 should be :
'ARLINGTON IN','ARLINGTON INT''L
AVIATION','150227','12/18/2015','470','35','02/02/2016'

With this modification the entire file is imported correctly.

See also https://tools.ietf.org/html/rfc4180 section 2 point 7 :
  If double-quotes are used to enclose fields, then a double-quote
  appearing inside a field must be escaped by preceding it with
  another double quote.  For example:
       "aaa","b""bb","ccc"

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 126805] Importing CSV where apostrophe is separator and contained in text causes remaining rows to not import

Posted by bu...@apache.org.
https://bz.apache.org/ooo/show_bug.cgi?id=126805

--- Comment #8 from Kay <ks...@apache.org> ---
(In reply to damjan from comment #7)
> (In reply to Kay from comment #6)
> > Results of my test with Linux-32 r. 1739631
> > 
> > The file imports completely. However line 124, second field from the
> > original file as:
> > 
> > 'ARLINGTON INT'L AVIATION'
> > 
> > gets imported as:
> > 
> > ARLINGTON INTL AVIATION'
> > 
> > when I use the following as CSV import specifications:
> > 
> > * text separator -- comma
> > * text deliminitor -- ' (single quote)
> > 
> > So the internal quote in INT'L gets dropped.
> 
> Which is exactly what Excel does in its "Text to columns" with that line,
> and which is how it generally works: fields starting with the quote
> character continue until the first quote character without an adjacent quote
> character, and then any subsequent text is appended as it is, without
> considering quoting at all.

Ah. OK. I haven't used Excel in many many years.

-- 
You are receiving this mail because:
You are the assignee for the issue.