You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@openoffice.apache.org by bu...@apache.org on 2019/01/27 14:46:49 UTC

[Issue 128019] New: Replace OpenOffice string implementation with Standard Library string implementation

https://bz.apache.org/ooo/show_bug.cgi?id=128019

          Issue ID: 128019
        Issue Type: DEFECT
           Summary: Replace OpenOffice string implementation with Standard
                    Library string implementation
           Product: General
           Version: 4.2.0-dev
          Hardware: All
                OS: All
            Status: CONFIRMED
          Severity: Normal
          Priority: P5 (lowest)
         Component: code
          Assignee: issues@openoffice.apache.org
          Reporter: petko@apache.org
  Target Milestone: ---

The goal is to use the standard implementation of String template instead our
own implementation.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

--- Comment #3 from damjan@apache.org ---
We have C string structs and C++ string wrapper classes around those, found in
main/sal, in ASCII and "Unicode" (UTF-16) versions, with 2^32 chars max length.

Another 2 are in main/tools, 2^16 chars max length, used by Calc, StarBasic,
possibly more. Keeping max string length in a 16 bit instead of 32 bit length
field probably saves a lot of space in spreadsheets with lots of cells; Excel
also does this.

Apart from being based on sal_Char / sal_Unicode instead of native C++ types,
they contain many functions not found in C++ standard library strings, eg.
conversion to/from integer and double, string tokenization, interning,
comparison of Unicode strings against ASCII, etc.

Given the move to UTF8-only languages lately (Go, Rust), and the UTF-8
everywhere manifesto (https://utf8everywhere.org), we could consider
eliminating the UTF-16 strings, and using the ASCII strings as UTF-8. That
would however require fixing all code to traverse code points instead of code
units, something it probably does wrong already.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

jeffooo <je...@orange.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jeffooo@orange.fr

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

--- Comment #2 from Peter <pe...@apache.org> ---
Do we need 6? shouldn't be one enough?

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

--- Comment #5 from Peter <pe...@apache.org> ---
I like the UTF-8 approach as described on https://utf8everywhere.org/ but I
have not many insights on alternatives.

I think we should decouple the string implementation from OpenOffice. This
would allow us to be able to change and maintain this part easier.

Also we would need valid convertors for the Other UTF definitions.I think maybe
it makes sense to base the string implementation on STL, then have a Own string
class that adds the features we need, hidden behind an interface.

And we have to check the API. This is I think the most hideous part, based on
the FOSDEM presentation. https://ftp.fau.de/fosdem/2018/AW1.120/ode_uri.mp4
UTF concerning part starts around 10 Minutes.Very interesting talk, thanks to
Stephan Bergmann.

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

--- Comment #4 from Peter <pe...@apache.org> ---
I added the 67649 for reference on an issue. Because an String Overhaul has the
high possibility of fixing the other Bug. (IMHO)

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

damjan@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |damjan@apache.org

--- Comment #1 from damjan@apache.org ---
Which one? We have 6 string implementations:
https://wiki.openoffice.org/wiki/Hacking#Can_I_get_a_char_.2A.2C_please.3F

-- 
You are receiving this mail because:
You are the assignee for the issue.

[Issue 128019] Replace OpenOffice string implementation with Standard Library string implementation

Posted by bu...@apache.org.

https://bz.apache.org/ooo/show_bug.cgi?id=128019

Peter <pe...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |67649


Referenced Issues:

https://bz.apache.org/ooo/show_bug.cgi?id=67649
[Issue 67649] concordance files + multilungal entries; utf-8 don't work
-- 
You are receiving this mail because:
You are the assignee for the issue.