You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ram kumar <ra...@gmail.com> on 2016/05/17 16:39:57 UTC

Error joining dataframes

Hi,

I tried to join two dataframe

df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")

df_join.registerTempTable("join_test")


When querying "Id" from "join_test"

0: jdbc:hive2://> *select Id from join_test;*
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
0: jdbc:hive2://>

Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"

Thanks

Re: Error joining dataframes

Posted by Takeshi Yamamuro <li...@gmail.com>.

You can use the api in spark-v1.6+.
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L454

// maropu

On Wed, May 18, 2016 at 3:16 PM, ram kumar <ra...@gmail.com> wrote:

> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> <console>:27: error: type mismatch;
>  found   : String("Id")
>  required: org.apache.spark.sql.Column
>        var df_join = df1.join(df2, "Id", "fullouter")
>                                    ^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Error joining dataframes

Posted by ram kumar <ra...@gmail.com>.

I tried it,

eg:
 df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter")

+----+----+----+----+

|  id|   A|  id|   B|

+----+----+----+----+

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+----+----+----+----+


if I try,
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id"))



+----+----+----+

|   A|  id|   B|

+----+----+----+

|   0|null|null|

|   0|   2|   0|

|null|   3|   0|

+----+----+----+

The "id" = 1 will be lost

On Wed, May 18, 2016 at 1:52 PM, Divya Gehlot <di...@gmail.com>
wrote:

> Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
> "fullouter").drop(df1("Id"))
> On May 18, 2016 2:16 PM, "ram kumar" <ra...@gmail.com> wrote:
>
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> <console>:27: error: type mismatch;
>  found   : String("Id")
>  required: org.apache.spark.sql.Column
>        var df_join = df1.join(df2, "Id", "fullouter")
>                                    ^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>

Re: Error joining dataframes

Posted by Divya Gehlot <di...@gmail.com>.

Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
"fullouter").drop(df1("Id"))
On May 18, 2016 2:16 PM, "ram kumar" <ra...@gmail.com> wrote:

I tried

scala> var df_join = df1.join(df2, "Id", "fullouter")
<console>:27: error: type mismatch;
 found   : String("Id")
 required: org.apache.spark.sql.Column
       var df_join = df1.join(df2, "Id", "fullouter")
                                   ^

scala>

And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)

On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
wrote:

> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>

Re: Error joining dataframes

Posted by ram kumar <ra...@gmail.com>.

I tried

scala> var df_join = df1.join(df2, "Id", "fullouter")
<console>:27: error: type mismatch;
 found   : String("Id")
 required: org.apache.spark.sql.Column
       var df_join = df1.join(df2, "Id", "fullouter")
                                   ^

scala>

And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)

On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
wrote:

> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>

Re: Error joining dataframes

Posted by Takeshi Yamamuro <li...@gmail.com>.

Ah, yes. `df_join` has the two `id`, so you need to select which id you use;

scala> :paste

// Entering paste mode (ctrl-D to finish)


val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")

val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")

val df3 = df1.join(df2, df1("id") === df2("id"), "outer")

df3.printSchema

df3.select(df1("id")).show


// Exiting paste mode, now interpreting.


root

 |-- id: integer (nullable = true)

 |-- A: integer (nullable = true)

 |-- id: integer (nullable = true)

 |-- B: integer (nullable = true)


+----+

|  id|

+----+

|   1|

|   2|

|null|

+----+



On Wed, May 18, 2016 at 4:29 PM, ram kumar <ra...@gmail.com> wrote:

> When you register a temp table from the dataframe
>
> eg:
> var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
> df_join.registerTempTable("test")
>
> sqlContext.sql("select * from test")
>
> +----+----+----+----+
>
> |  id|   A|  id|   B|
>
> +----+----+----+----+
>
> |   1|   0|null|null|
>
> |   2|   0|   2|   0|
>
> |null|null|   3|   0|
>
> +----+----+----+----+
>
>
> but, when you query the "id"
>
>
> sqlContext.sql("select id from test")
>
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>
> On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Look weird, seems spark-v1.5.x can accept the query.
>> What's the difference between the example and your query?
>>
>> ####
>>
>> Welcome to
>>
>>       ____              __
>>
>>      / __/__  ___ _____/ /__
>>
>>     _\ \/ _ \/ _ `/ __/  '_/
>>
>>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>>
>>       /_/
>>
>> scala> :paste
>>
>> // Entering paste mode (ctrl-D to finish)
>>
>> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
>>
>> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
>>
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>>
>> // Exiting paste mode, now interpreting.
>>
>>
>> +----+----+----+----+
>>
>> |  id|   A|  id|   B|
>>
>> +----+----+----+----+
>>
>> |   1|   0|null|null|
>>
>> |   2|   0|   2|   0|
>>
>> |null|null|   3|   0|
>>
>> +----+----+----+----+
>>
>>
>> df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
>>
>> df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
>>
>>
>>
>>
>>
>> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> I tried
>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>
>>> But there is a duplicate "id" and when I query the "id", I get
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>
>>> I am currently using spark 1.5.2.
>>> Is there any alternative way in 1.5
>>>
>>> Thanks
>>>
>>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <
>>> linguin.m.s@gmail.com> wrote:
>>>
>>>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>>>
>>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>>
>>>> // maropu
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> If I run as
>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>
>>>>> It takes as inner join.
>>>>>
>>>>>
>>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> pretty simple, a similar construct to tables projected as DF
>>>>>>
>>>>>> val c =
>>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>>>> val t =
>>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Try this one:
>>>>>>>
>>>>>>>
>>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bijay
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I tried to join two dataframe
>>>>>>>>
>>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>>>
>>>>>>>> df_join.registerTempTable("join_test")
>>>>>>>>
>>>>>>>>
>>>>>>>> When querying "Id" from "join_test"
>>>>>>>>
>>>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7
>>>>>>>> (state=,code=0)
>>>>>>>> 0: jdbc:hive2://>
>>>>>>>>
>>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
>>>>>>>> one "Id"
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Error joining dataframes

Posted by ram kumar <ra...@gmail.com>.

When you register a temp table from the dataframe

eg:
var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
df_join.registerTempTable("test")

sqlContext.sql("select * from test")

+----+----+----+----+

|  id|   A|  id|   B|

+----+----+----+----+

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+----+----+----+----+


but, when you query the "id"


sqlContext.sql("select id from test")

*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)

On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> Look weird, seems spark-v1.5.x can accept the query.
> What's the difference between the example and your query?
>
> ####
>
> Welcome to
>
>       ____              __
>
>      / __/__  ___ _____/ /__
>
>     _\ \/ _ \/ _ `/ __/  '_/
>
>    /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
>
>       /_/
>
> scala> :paste
>
> // Entering paste mode (ctrl-D to finish)
>
> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
>
> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
>
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
>
> // Exiting paste mode, now interpreting.
>
>
> +----+----+----+----+
>
> |  id|   A|  id|   B|
>
> +----+----+----+----+
>
> |   1|   0|null|null|
>
> |   2|   0|   2|   0|
>
> |null|null|   3|   0|
>
> +----+----+----+----+
>
>
> df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
>
> df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
>
>
>
>
>
> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com>
> wrote:
>
>> I tried
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> But there is a duplicate "id" and when I query the "id", I get
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>
>> I am currently using spark 1.5.2.
>> Is there any alternative way in 1.5
>>
>> Thanks
>>
>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin.m.s@gmail.com
>> > wrote:
>>
>>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>>
>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>
>>> // maropu
>>>
>>>
>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>>> wrote:
>>>
>>>> If I run as
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> It takes as inner join.
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> pretty simple, a similar construct to tables projected as DF
>>>>>
>>>>> val c =
>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>>> val t =
>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Try this one:
>>>>>>
>>>>>>
>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>>
>>>>>> Thanks,
>>>>>> Bijay
>>>>>>
>>>>>>
>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to join two dataframe
>>>>>>>
>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>>
>>>>>>> df_join.registerTempTable("join_test")
>>>>>>>
>>>>>>>
>>>>>>> When querying "Id" from "join_test"
>>>>>>>
>>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>>> 0: jdbc:hive2://>
>>>>>>>
>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
>>>>>>> one "Id"
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Error joining dataframes

Posted by Takeshi Yamamuro <li...@gmail.com>.

Look weird, seems spark-v1.5.x can accept the query.
What's the difference between the example and your query?

####

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

      /_/

scala> :paste

// Entering paste mode (ctrl-D to finish)

val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")

val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")

df1.join(df2, df1("id") === df2("id"), "outer").show


// Exiting paste mode, now interpreting.


+----+----+----+----+

|  id|   A|  id|   B|

+----+----+----+----+

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+----+----+----+----+


df1: org.apache.spark.sql.DataFrame = [id: int, A: int]

df2: org.apache.spark.sql.DataFrame = [id: int, B: int]





On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com> wrote:

> I tried
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> But there is a duplicate "id" and when I query the "id", I get
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>
> I am currently using spark 1.5.2.
> Is there any alternative way in 1.5
>
> Thanks
>
> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>
>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> // maropu
>>
>>
>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> If I run as
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> It takes as inner join.
>>>
>>>
>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> pretty simple, a similar construct to tables projected as DF
>>>>
>>>> val c =
>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>> val t =
>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Try this one:
>>>>>
>>>>>
>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>
>>>>> Thanks,
>>>>> Bijay
>>>>>
>>>>>
>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried to join two dataframe
>>>>>>
>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>
>>>>>> df_join.registerTempTable("join_test")
>>>>>>
>>>>>>
>>>>>> When querying "Id" from "join_test"
>>>>>>
>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>> 0: jdbc:hive2://>
>>>>>>
>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>>> "Id"
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Error joining dataframes

Posted by ram kumar <ra...@gmail.com>.

I tried
df1.join(df2, df1("id") === df2("id"), "outer").show

But there is a duplicate "id" and when I query the "id", I get
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)

I am currently using spark 1.5.2.
Is there any alternative way in 1.5

Thanks

On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> Also, you can pass the query that you'd like to use in spark-v1.6+;
>
> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> // maropu
>
>
> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
> wrote:
>
>> If I run as
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> It takes as inner join.
>>
>>
>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> pretty simple, a similar construct to tables projected as DF
>>>
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Try this one:
>>>>
>>>>
>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>
>>>> Thanks,
>>>> Bijay
>>>>
>>>>
>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to join two dataframe
>>>>>
>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>
>>>>> df_join.registerTempTable("join_test")
>>>>>
>>>>>
>>>>> When querying "Id" from "join_test"
>>>>>
>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>> 0: jdbc:hive2://>
>>>>>
>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>> "Id"
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Error joining dataframes

Posted by Takeshi Yamamuro <li...@gmail.com>.

Also, you can pass the query that you'd like to use in spark-v1.6+;

val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
df1.join(df2, df1("id") === df2("id"), "outer").show

// maropu


On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com> wrote:

> If I run as
> val rs = s.join(t,"time_id").join(c,"channel_id")
>
> It takes as inner join.
>
>
> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> pretty simple, a similar construct to tables projected as DF
>>
>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>
>>> Hi,
>>>
>>> Try this one:
>>>
>>>
>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>
>>> Thanks,
>>> Bijay
>>>
>>>
>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried to join two dataframe
>>>>
>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>
>>>> df_join.registerTempTable("join_test")
>>>>
>>>>
>>>> When querying "Id" from "join_test"
>>>>
>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>> 0: jdbc:hive2://>
>>>>
>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>> "Id"
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Error joining dataframes

Posted by ram kumar <ra...@gmail.com>.

If I run as
val rs = s.join(t,"time_id").join(c,"channel_id")

It takes as inner join.


On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> pretty simple, a similar construct to tables projected as DF
>
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
> val rs = s.join(t,"time_id").join(c,"channel_id")
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>

Re: Error joining dataframes

Posted by Mich Talebzadeh <mi...@gmail.com>.

pretty simple, a similar construct to tables projected as DF

val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
val rs = s.join(t,"time_id").join(c,"channel_id")

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:

> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>

Re: Error joining dataframes

Posted by Bijay Kumar Pathak <bk...@mtu.edu>.

Hi,

Try this one:


df_join = df1.*join*(df2, 'Id', "fullouter")

Thanks,
Bijay


On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com> wrote:

> Hi,
>
> I tried to join two dataframe
>
> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>
> df_join.registerTempTable("join_test")
>
>
> When querying "Id" from "join_test"
>
> 0: jdbc:hive2://> *select Id from join_test;*
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
> 0: jdbc:hive2://>
>
> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>
> Thanks
>