You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Adrien Lavoillotte (JIRA)" <ji...@apache.org> on 2017/09/04 08:19:00 UTC
[jira] [Commented] (SPARK-21850) SparkSQL cannot perform LIKE someColumn if someColumn's value contains a backslash \

    [ https://issues.apache.org/jira/browse/SPARK-21850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152241#comment-16152241 ] 

Adrien Lavoillotte commented on SPARK-21850:
--------------------------------------------

The behaviour seems indeed logical if it takes the actual value without escaping it, and I actually replicated it in some other DBs (our earlier tests were wrong, each DB having its own rules for escaping \), although they just don't match instead of failing, which is arguably preferable. I'll close the issue, thank you for your help!

> SparkSQL cannot perform LIKE someColumn if someColumn's value contains a backslash \
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-21850
>                 URL: https://issues.apache.org/jira/browse/SPARK-21850
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Adrien Lavoillotte
>
> I have a test table looking like this:
> {code:none}
> spark.sql("select * from `test`.`types_basic`").show()
> {code}
> ||id||c_tinyint|| [...] || c_string||
> |  0|     -128| [...] |              string|
> |  1|        0| [...] |string 'with' "qu...|
> |  2|      127| [...] |      unicod€ strĭng|
> |  3|       42| [...] |there is a \n in ...|
> |  4|     null| [...] |                null|
> Note the line with ID 3, which has a literal \n in c_string (e.g. "some \\n string", not a line break). I would like to join another table using a LIKE condition (to join on prefix). If I do this:
> {code:none}
> spark.sql("select * from `test`.`types_basic` a where a.`c_string` LIKE CONCAT(a.`c_string`, '%')").show()
> {code}
> I get the following error in spark 2.2 (but not in any earlier version):
> {noformat}
> 17/08/28 12:47:38 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 9.0 (TID 12, cdh5.local, executor 2): org.apache.spark.sql.AnalysisException: the pattern 'there is a \n in this line%' is invalid, the escape character is not allowed to precede 'n';
> 	at org.apache.spark.sql.catalyst.util.StringUtils$.fail$1(StringUtils.scala:42)
> 	at org.apache.spark.sql.catalyst.util.StringUtils$.escapeLikeRegex(StringUtils.scala:51)
> 	at org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(StringUtils.scala)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
> 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
> 	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:108)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}
> It seems to me that if LIKE requires special escaping there, then it should be provided by SparkSQL on the value of the column.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org