You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Raghava Mutharaju <m....@gmail.com> on 2016/02/09 16:07:55 UTC

Dataset joinWith condition

Hello All,

joinWith() method in Dataset takes a condition of type Column. Without
converting a Dataset to a DataFrame, how can we get a specific column?

For eg: case class Pair(x: Long, y: Long)

A, B are Datasets of type Pair and I want to join A.x with B.y

A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))

Is there a way to avoid using toDF()?

I am having similar issues with the usage of filter(A.x == B.y)

-- 
Regards,
Raghava

Re: Dataset joinWith condition

Posted by Raghava Mutharaju <m....@gmail.com>.
Thanks a lot Ted.

If the two columns are of different types say Int and Long, then will be
ds.select(expr("_2 / _1").as[(Int, Long)])

Regards,
Raghava.


On Wed, Feb 10, 2016 at 5:19 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. I followed something similar $"a.x"
>
> Please use expr("...")
> e.g. if your DataSet has two columns, you can write:
>   ds.select(expr("_2 / _1").as[Int])
>
> where _1 refers to first column and _2 refers to second.
>
> On Tue, Feb 9, 2016 at 3:31 PM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
>> Ted,
>>
>> Thank you for the pointer. That works, but what does a string prepended
>> with $ sign mean? Is it an expression?
>>
>> Could you also help me with the select() parameter syntax? I followed
>> something similar $"a.x" and it gives an error message that a TypedColumn
>> is expected.
>>
>> Regards,
>> Raghava.
>>
>>
>> On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look at:
>>> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>>>
>>>     val ds1 = Seq(1, 2, 3).toDS().as("a")
>>>     val ds2 = Seq(1, 2).toDS().as("b")
>>>
>>>     checkAnswer(
>>>       ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>>>
>>> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
>>> m.vijayaraghava@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> joinWith() method in Dataset takes a condition of type Column. Without
>>>> converting a Dataset to a DataFrame, how can we get a specific column?
>>>>
>>>> For eg: case class Pair(x: Long, y: Long)
>>>>
>>>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>>>
>>>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>>>
>>>> Is there a way to avoid using toDF()?
>>>>
>>>> I am having similar issues with the usage of filter(A.x == B.y)
>>>>
>>>> --
>>>> Regards,
>>>> Raghava
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Raghava
>> http://raghavam.github.io
>>
>
>


-- 
Regards,
Raghava
http://raghavam.github.io

Re: Dataset joinWith condition

Posted by Ted Yu <yu...@gmail.com>.
bq. I followed something similar $"a.x"

Please use expr("...")
e.g. if your DataSet has two columns, you can write:
  ds.select(expr("_2 / _1").as[Int])

where _1 refers to first column and _2 refers to second.

On Tue, Feb 9, 2016 at 3:31 PM, Raghava Mutharaju <m.vijayaraghava@gmail.com
> wrote:

> Ted,
>
> Thank you for the pointer. That works, but what does a string prepended
> with $ sign mean? Is it an expression?
>
> Could you also help me with the select() parameter syntax? I followed
> something similar $"a.x" and it gives an error message that a TypedColumn
> is expected.
>
> Regards,
> Raghava.
>
>
> On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Please take a look at:
>> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>>
>>     val ds1 = Seq(1, 2, 3).toDS().as("a")
>>     val ds2 = Seq(1, 2).toDS().as("b")
>>
>>     checkAnswer(
>>       ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>>
>> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
>> m.vijayaraghava@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> joinWith() method in Dataset takes a condition of type Column. Without
>>> converting a Dataset to a DataFrame, how can we get a specific column?
>>>
>>> For eg: case class Pair(x: Long, y: Long)
>>>
>>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>>
>>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>>
>>> Is there a way to avoid using toDF()?
>>>
>>> I am having similar issues with the usage of filter(A.x == B.y)
>>>
>>> --
>>> Regards,
>>> Raghava
>>>
>>
>>
>
>
> --
> Regards,
> Raghava
> http://raghavam.github.io
>

Re: Dataset joinWith condition

Posted by Raghava Mutharaju <m....@gmail.com>.
Ted,

Thank you for the pointer. That works, but what does a string prepended
with $ sign mean? Is it an expression?

Could you also help me with the select() parameter syntax? I followed
something similar $"a.x" and it gives an error message that a TypedColumn
is expected.

Regards,
Raghava.


On Tue, Feb 9, 2016 at 10:12 AM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at:
> sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
>
>     val ds1 = Seq(1, 2, 3).toDS().as("a")
>     val ds2 = Seq(1, 2).toDS().as("b")
>
>     checkAnswer(
>       ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),
>
> On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <
> m.vijayaraghava@gmail.com> wrote:
>
>> Hello All,
>>
>> joinWith() method in Dataset takes a condition of type Column. Without
>> converting a Dataset to a DataFrame, how can we get a specific column?
>>
>> For eg: case class Pair(x: Long, y: Long)
>>
>> A, B are Datasets of type Pair and I want to join A.x with B.y
>>
>> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>>
>> Is there a way to avoid using toDF()?
>>
>> I am having similar issues with the usage of filter(A.x == B.y)
>>
>> --
>> Regards,
>> Raghava
>>
>
>


-- 
Regards,
Raghava
http://raghavam.github.io

Re: Dataset joinWith condition

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at:
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

    val ds1 = Seq(1, 2, 3).toDS().as("a")
    val ds2 = Seq(1, 2).toDS().as("b")

    checkAnswer(
      ds1.joinWith(ds2, $"a.value" === $"b.value", "inner"),

On Tue, Feb 9, 2016 at 7:07 AM, Raghava Mutharaju <m.vijayaraghava@gmail.com
> wrote:

> Hello All,
>
> joinWith() method in Dataset takes a condition of type Column. Without
> converting a Dataset to a DataFrame, how can we get a specific column?
>
> For eg: case class Pair(x: Long, y: Long)
>
> A, B are Datasets of type Pair and I want to join A.x with B.y
>
> A.joinWith(B, A.toDF().col("x") == B.toDF().col("y"))
>
> Is there a way to avoid using toDF()?
>
> I am having similar issues with the usage of filter(A.x == B.y)
>
> --
> Regards,
> Raghava
>