You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/07/21 08:49:04 UTC

[jira] [Created] (SPARK-9213) Improve regular expression performance

Reynold Xin created SPARK-9213:
----------------------------------

             Summary: Improve regular expression performance
                 Key: SPARK-9213
                 URL: https://issues.apache.org/jira/browse/SPARK-9213
             Project: Spark
          Issue Type: Umbrella
          Components: SQL
            Reporter: Reynold Xin


I'm creating an umbrella ticket to improve regular expression performance for string expressions. Right now our use of regular expressions is inefficient for two reasons:

1. Java regex in general is slow.
2. We have to convert everything from UTF8 encoded bytes into Java String, and then run regex on it, and then convert it back.

There are libraries in Java that provide regex support directly on UTF8 encoded bytes. One prominent example is joni, used in JRuby.]









--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org