You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Dawid Wysakowicz <wy...@gmail.com> on 2015/11/30 19:01:53 UTC
Problem with arrays in phoenix-spark
Hi,
I've recently found some behaviour that I found buggy when working with
phoenix-spark and arrays.
Take a look at those unit tests:
test("Can save arrays from custom dataframes back to phoenix") {
val dataSet = List(Row(2L, Array("String1", "String2", "String3")))
val sqlContext = new SQLContext(sc)
val schema = StructType(
Seq(StructField("ID", LongType, nullable = false),
StructField("VCARRAY", ArrayType(StringType))))
val rowRDD = sc.parallelize(dataSet)
// Apply the schema to the RDD.
val df = sqlContext.createDataFrame(rowRDD, schema)
df.write
.format("org.apache.phoenix.spark")
.options(Map("table" -> "ARRAY_TEST_TABLE", "zkUrl" -> quorumAddress))
.mode(SaveMode.Overwrite)
.save()
}
test("Can save arrays of AnyVal type back to phoenix") {
val dataSet = List((2L, Array(1, 2, 3), Array(1L, 2L, 3L)))
sc
.parallelize(dataSet)
.saveToPhoenix(
"ARRAY_ANYVAL_TEST_TABLE",
Seq("ID", "INTARRAY", "BIGINTARRAY"),
zkUrl = Some(quorumAddress)
)
// Load the results back
val stmt = conn.createStatement()
val rs = stmt.executeQuery("SELECT INTARRAY, BIGINTARRAY FROM
ARRAY_ANYVAL_TEST_TABLE WHERE ID = 2")
rs.next()
val intArray = rs.getArray(1).getArray().asInstanceOf[Array[Int]]
val longArray = rs.getArray(2).getArray().asInstanceOf[Array[Long]]
// Verify the arrays are equal
intArray shouldEqual dataSet(0)._2
longArray shouldEqual dataSet(0)._3
}
Both fail with some ClassCastExceptions.
In attached patch I've proposed a solution. The tricky part is with
Array[Byte] as this would be same for both VARBINARY and TINYINT[].
Let me know If I should create an issue for this, and if my solution
satisfies you.
Regards
Dawid Wysakowicz
Re: Problem with arrays in phoenix-spark
Posted by Dawid Wysakowicz <wy...@gmail.com>.
Sure, I have done that.
https://issues.apache.org/jira/browse/PHOENIX-2469
2015-11-30 22:22 GMT+01:00 Josh Mahonin <jm...@gmail.com>:
> Hi David,
>
> Thanks for the bug report and the proposed patch. Please file a JIRA and
> we'll take the discussion there.
>
> Josh
>
> On Mon, Nov 30, 2015 at 1:01 PM, Dawid Wysakowicz <
> wysakowicz.dawid@gmail.com> wrote:
>
>> Hi,
>>
>> I've recently found some behaviour that I found buggy when working with
>> phoenix-spark and arrays.
>>
>> Take a look at those unit tests:
>>
>> test("Can save arrays from custom dataframes back to phoenix") {
>> val dataSet = List(Row(2L, Array("String1", "String2", "String3")))
>>
>> val sqlContext = new SQLContext(sc)
>>
>> val schema = StructType(
>> Seq(StructField("ID", LongType, nullable = false),
>> StructField("VCARRAY", ArrayType(StringType))))
>>
>> val rowRDD = sc.parallelize(dataSet)
>>
>> // Apply the schema to the RDD.
>> val df = sqlContext.createDataFrame(rowRDD, schema)
>>
>> df.write
>> .format("org.apache.phoenix.spark")
>> .options(Map("table" -> "ARRAY_TEST_TABLE", "zkUrl" ->
>> quorumAddress))
>> .mode(SaveMode.Overwrite)
>> .save()
>> }
>>
>> test("Can save arrays of AnyVal type back to phoenix") {
>> val dataSet = List((2L, Array(1, 2, 3), Array(1L, 2L, 3L)))
>>
>> sc
>> .parallelize(dataSet)
>> .saveToPhoenix(
>> "ARRAY_ANYVAL_TEST_TABLE",
>> Seq("ID", "INTARRAY", "BIGINTARRAY"),
>> zkUrl = Some(quorumAddress)
>> )
>>
>> // Load the results back
>> val stmt = conn.createStatement()
>> val rs = stmt.executeQuery("SELECT INTARRAY, BIGINTARRAY FROM
>> ARRAY_ANYVAL_TEST_TABLE WHERE ID = 2")
>> rs.next()
>> val intArray = rs.getArray(1).getArray().asInstanceOf[Array[Int]]
>> val longArray = rs.getArray(2).getArray().asInstanceOf[Array[Long]]
>>
>> // Verify the arrays are equal
>> intArray shouldEqual dataSet(0)._2
>> longArray shouldEqual dataSet(0)._3
>> }
>>
>> Both fail with some ClassCastExceptions.
>>
>> In attached patch I've proposed a solution. The tricky part is with
>> Array[Byte] as this would be same for both VARBINARY and TINYINT[].
>>
>> Let me know If I should create an issue for this, and if my solution
>> satisfies you.
>>
>> Regards
>> Dawid Wysakowicz
>>
>>
>>
>
Re: Problem with arrays in phoenix-spark
Posted by Josh Mahonin <jm...@gmail.com>.
Hi David,
Thanks for the bug report and the proposed patch. Please file a JIRA and
we'll take the discussion there.
Josh
On Mon, Nov 30, 2015 at 1:01 PM, Dawid Wysakowicz <
wysakowicz.dawid@gmail.com> wrote:
> Hi,
>
> I've recently found some behaviour that I found buggy when working with
> phoenix-spark and arrays.
>
> Take a look at those unit tests:
>
> test("Can save arrays from custom dataframes back to phoenix") {
> val dataSet = List(Row(2L, Array("String1", "String2", "String3")))
>
> val sqlContext = new SQLContext(sc)
>
> val schema = StructType(
> Seq(StructField("ID", LongType, nullable = false),
> StructField("VCARRAY", ArrayType(StringType))))
>
> val rowRDD = sc.parallelize(dataSet)
>
> // Apply the schema to the RDD.
> val df = sqlContext.createDataFrame(rowRDD, schema)
>
> df.write
> .format("org.apache.phoenix.spark")
> .options(Map("table" -> "ARRAY_TEST_TABLE", "zkUrl" ->
> quorumAddress))
> .mode(SaveMode.Overwrite)
> .save()
> }
>
> test("Can save arrays of AnyVal type back to phoenix") {
> val dataSet = List((2L, Array(1, 2, 3), Array(1L, 2L, 3L)))
>
> sc
> .parallelize(dataSet)
> .saveToPhoenix(
> "ARRAY_ANYVAL_TEST_TABLE",
> Seq("ID", "INTARRAY", "BIGINTARRAY"),
> zkUrl = Some(quorumAddress)
> )
>
> // Load the results back
> val stmt = conn.createStatement()
> val rs = stmt.executeQuery("SELECT INTARRAY, BIGINTARRAY FROM
> ARRAY_ANYVAL_TEST_TABLE WHERE ID = 2")
> rs.next()
> val intArray = rs.getArray(1).getArray().asInstanceOf[Array[Int]]
> val longArray = rs.getArray(2).getArray().asInstanceOf[Array[Long]]
>
> // Verify the arrays are equal
> intArray shouldEqual dataSet(0)._2
> longArray shouldEqual dataSet(0)._3
> }
>
> Both fail with some ClassCastExceptions.
>
> In attached patch I've proposed a solution. The tricky part is with
> Array[Byte] as this would be same for both VARBINARY and TINYINT[].
>
> Let me know If I should create an issue for this, and if my solution
> satisfies you.
>
> Regards
> Dawid Wysakowicz
>
>
>