You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sameer Agarwal (JIRA)" <ji...@apache.org> on 2016/07/12 00:50:10 UTC

[jira] [Created] (SPARK-16488) Codegen variable namespace collision for pmod and partitionBy

Sameer Agarwal created SPARK-16488:
--------------------------------------

             Summary: Codegen variable namespace collision for pmod and partitionBy
                 Key: SPARK-16488
                 URL: https://issues.apache.org/jira/browse/SPARK-16488
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Sameer Agarwal


Reported by [~brkyvz]. Original description below:

The generated code used by `pmod` conflicts with DataFrameWriter.partitionBy

Quick repro:
{code}
import org.apache.spark.sql.functions._
case class Test(a: Int, b: String)

val ds = Seq(Test(0, "a"), Test(1, "b"), Test(1, "a")).toDS.createOrReplaceTempView("test")
sql("""
select 
  *
from
  test
distribute by
  pmod(a, 2)
""")
  .write
  .partitionBy("b")
  .mode("overwrite")
  .parquet("/tmp/repro")
{code}

You may also use repartition with the function `pmod` instead of using `pmod` inside `distribute by` in sql.

Example generated code (two variables defined as r):
{code}
/* 025 */   public UnsafeRow apply(InternalRow i) {
/* 026 */     int value1 = 42;
/* 027 */
/* 028 */     boolean isNull2 = i.isNullAt(0);
/* 029 */     UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
/* 030 */     if (!isNull2) {
/* 031 */       value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value2.getBaseObject(), value2.getBaseOffset(), value2.numBytes(), value1);
/* 032 */     }
/* 033 */
/* 034 */
/* 035 */     int value4 = 42;
/* 036 */
/* 037 */     boolean isNull5 = i.isNullAt(1);
/* 038 */     UTF8String value5 = isNull5 ? null : (i.getUTF8String(1));
/* 039 */     if (!isNull5) {
/* 040 */       value4 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value5.getBaseObject(), value5.getBaseOffset(), value5.numBytes(), value4);
/* 041 */     }
/* 042 */
/* 043 */     int value3 = -1;
/* 044 */
/* 045 */     int r = value4 % 10;
/* 046 */     if (r < 0) {
/* 047 */       value3 = (r + 10) % 10;
/* 048 */     } else {
/* 049 */       value3 = r;
/* 050 */     }
/* 051 */     value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(value3, value1);
/* 052 */
/* 053 */     int value = -1;
/* 054 */
/* 055 */     int r = value1 % 200;
/* 056 */     if (r < 0) {
/* 057 */       value = (r + 200) % 200;
/* 058 */     } else {
/* 059 */       value = r;
/* 060 */     }
/* 061 */     rowWriter.write(0, value);
/* 062 */     return result;
/* 063 */   }
/* 064 */ }
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org