You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Raghava Mutharaju <m....@gmail.com> on 2016/02/09 16:07:55 UTC
Dataset joinWith condition
Hello All,
joinWith() method in Dataset takes a condition of type Column. Without
converting a Dataset to a DataFrame, how can we get a specific column?
For eg: case class Pair(x: Long, y: Long)
A, B are Datasets of type Pair and I want to join A.x with B.y
A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
Is there a way to avoid using toDF()?
I am having similar issues with the usage of filter(A.x == B.y)
--
Regards,
Raghava
Re: Dataset joinWith condition
Posted by Raghava Mutharaju <m....@gmail.com>.
Thanks a lot Ted.
If the two columns are of different types say Int and Long, then will be
ds.select(expr("_2 / _1").as[(Int, Long)])
Regards,
Raghava.
On Wed, Feb 10, 2016 at 5:19 PM, Ted Yu <yu...@gmail.com> wrote:
> bq. I followed something similar $"a.x"
>
> Please use expr("...")
> e.g. if your DataSet has two columns, you can write:
> ds.select(expr("_2 / _1").as[Int])
>
> where _1 refers to first column and _2 refers to second.
>
> On Tue, Feb 9, 2016 at 3:31 PM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
>> Ted,
>>
>> Thank you for the pointer. That works, but what does a string prepended
>> with $ sign mean? Is it an expression?
>>
>> Could you also help me with the select() parameter syntax? I followed
>> something similar $"a.x" and it gives an error message that a TypedColumn
>> is expected.
>>
>> Regards,
>> Raghava.
>>
>>
>> On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look at:
>>> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>>>
>>> val ds1 = Seq(1, 2, 3).toDS().as("a")
>>> val ds2 = Seq(1, 2).toDS().as("b")
>>>
>>> checkAnswer(
>>> ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>>>
>>> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
>>> m.vijayaraghava@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> joinWith() method in Dataset takes a condition of type Column. Without
>>>> converting a Dataset to a DataFrame, how can we get a specific column?
>>>>
>>>> For eg: case class Pair(x: Long, y: Long)
>>>>
>>>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>>>
>>>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>>>
>>>> Is there a way to avoid using toDF()?
>>>>
>>>> I am having similar issues with the usage of filter(A.x == B.y)
>>>>
>>>> --
>>>> Regards,
>>>> Raghava
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Raghava
>> http://raghavam.github.io
>>
>
>
--
Regards,
Raghava
http://raghavam.github.io
Re: Dataset joinWith condition
Posted by Ted Yu <yu...@gmail.com>.
bq. I followed something similar $"a.x"
Please use expr("...")
e.g. if your DataSet has two columns, you can write:
ds.select(expr("_2 / _1").as[Int])
where _1 refers to first column and _2 refers to second.
On Tue, Feb 9, 2016 at 3:31 PM, Raghava Mutharaju <m.vijayaraghava@gmail.com
> wrote:
> Ted,
>
> Thank you for the pointer. That works, but what does a string prepended
> with $ sign mean? Is it an expression?
>
> Could you also help me with the select() parameter syntax? I followed
> something similar $"a.x" and it gives an error message that a TypedColumn
> is expected.
>
> Regards,
> Raghava.
>
>
> On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look at:
>> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>>
>> val ds1 = Seq(1, 2, 3).toDS().as("a")
>> val ds2 = Seq(1, 2).toDS().as("b")
>>
>> checkAnswer(
>> ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>>
>> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
>> m.vijayaraghava@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> joinWith() method in Dataset takes a condition of type Column. Without
>>> converting a Dataset to a DataFrame, how can we get a specific column?
>>>
>>> For eg: case class Pair(x: Long, y: Long)
>>>
>>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>>
>>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>>
>>> Is there a way to avoid using toDF()?
>>>
>>> I am having similar issues with the usage of filter(A.x == B.y)
>>>
>>> --
>>> Regards,
>>> Raghava
>>>
>>
>>
>
>
> --
> Regards,
> Raghava
> http://raghavam.github.io
>
Re: Dataset joinWith condition
Posted by Raghava Mutharaju <m....@gmail.com>.
Ted,
Thank you for the pointer. That works, but what does a string prepended
with $ sign mean? Is it an expression?
Could you also help me with the select() parameter syntax? I followed
something similar $"a.x" and it gives an error message that a TypedColumn
is expected.
Regards,
Raghava.
On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:
> Please take a look at:
> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>
> val ds1 = Seq(1, 2, 3).toDS().as("a")
> val ds2 = Seq(1, 2).toDS().as("b")
>
> checkAnswer(
> ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>
> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
>> Hello All,
>>
>> joinWith() method in Dataset takes a condition of type Column. Without
>> converting a Dataset to a DataFrame, how can we get a specific column?
>>
>> For eg: case class Pair(x: Long, y: Long)
>>
>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>
>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>
>> Is there a way to avoid using toDF()?
>>
>> I am having similar issues with the usage of filter(A.x == B.y)
>>
>> --
>> Regards,
>> Raghava
>>
>
>
--
Regards,
Raghava
http://raghavam.github.io
Re: Dataset joinWith condition
Posted by Ted Yu <yu...@gmail.com>.
Please take a look at:
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
val ds1 = Seq(1, 2, 3).toDS().as("a")
val ds2 = Seq(1, 2).toDS().as("b")
checkAnswer(
ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <m.vijayaraghava@gmail.com
> wrote:
> Hello All,
>
> joinWith() method in Dataset takes a condition of type Column. Without
> converting a Dataset to a DataFrame, how can we get a specific column?
>
> For eg: case class Pair(x: Long, y: Long)
>
> A, B are Datasets of type Pair and I want to join A.x with B.y
>
> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>
> Is there a way to avoid using toDF()?
>
> I am having similar issues with the usage of filter(A.x == B.y)
>
> --
> Regards,
> Raghava
>