You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ram kumar <ra...@gmail.com> on 2016/05/17 16:39:57 UTC
Error joining dataframes
Hi,
I tried to join two dataframe
df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
df_join.registerTempTable("join_test")
When querying "Id" from "join_test"
0: jdbc:hive2://> *select Id from join_test;*
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
0: jdbc:hive2://>
Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
Thanks
Re: Error joining dataframes
Posted by Takeshi Yamamuro <li...@gmail.com>.
You can use the api in spark-v1.6+.
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L454
// maropu
On Wed, May 18, 2016 at 3:16 PM, ram kumar <ra...@gmail.com> wrote:
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> <console>:27: error: type mismatch;
> found : String("Id")
> required: org.apache.spark.sql.Column
> var df_join = df1.join(df2, "Id", "fullouter")
> ^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>
--
---
Takeshi Yamamuro
Re: Error joining dataframes
Posted by ram kumar <ra...@gmail.com>.
I tried it,
eg:
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter")
+----+----+----+----+
| id| A| id| B|
+----+----+----+----+
| 1| 0|null|null|
| 2| 0| 2| 0|
|null|null| 3| 0|
+----+----+----+----+
if I try,
df_join = df1.join(df2,df1( "Id") ===df2("Id"), "fullouter").drop(df1("Id"))
+----+----+----+
| A| id| B|
+----+----+----+
| 0|null|null|
| 0| 2| 0|
|null| 3| 0|
+----+----+----+
The "id" = 1 will be lost
On Wed, May 18, 2016 at 1:52 PM, Divya Gehlot <di...@gmail.com>
wrote:
> Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
> "fullouter").drop(df1("Id"))
> On May 18, 2016 2:16 PM, "ram kumar" <ra...@gmail.com> wrote:
>
> I tried
>
> scala> var df_join = df1.join(df2, "Id", "fullouter")
> <console>:27: error: type mismatch;
> found : String("Id")
> required: org.apache.spark.sql.Column
> var df_join = df1.join(df2, "Id", "fullouter")
> ^
>
> scala>
>
> And I cant see the above method in
>
> https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
>
> On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>
Re: Error joining dataframes
Posted by Divya Gehlot <di...@gmail.com>.
Can you try var df_join = df1.join(df2,df1( "Id") ===df2("Id"),
"fullouter").drop(df1("Id"))
On May 18, 2016 2:16 PM, "ram kumar" <ra...@gmail.com> wrote:
I tried
scala> var df_join = df1.join(df2, "Id", "fullouter")
<console>:27: error: type mismatch;
found : String("Id")
required: org.apache.spark.sql.Column
var df_join = df1.join(df2, "Id", "fullouter")
^
scala>
And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
wrote:
> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>
Re: Error joining dataframes
Posted by ram kumar <ra...@gmail.com>.
I tried
scala> var df_join = df1.join(df2, "Id", "fullouter")
<console>:27: error: type mismatch;
found : String("Id")
required: org.apache.spark.sql.Column
var df_join = df1.join(df2, "Id", "fullouter")
^
scala>
And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)
On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bk...@mtu.edu>
wrote:
> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>
Re: Error joining dataframes
Posted by Takeshi Yamamuro <li...@gmail.com>.
Ah, yes. `df_join` has the two `id`, so you need to select which id you use;
scala> :paste
// Entering paste mode (ctrl-D to finish)
val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
val df3 = df1.join(df2, df1("id") === df2("id"), "outer")
df3.printSchema
df3.select(df1("id")).show
// Exiting paste mode, now interpreting.
root
|-- id: integer (nullable = true)
|-- A: integer (nullable = true)
|-- id: integer (nullable = true)
|-- B: integer (nullable = true)
+----+
| id|
+----+
| 1|
| 2|
|null|
+----+
On Wed, May 18, 2016 at 4:29 PM, ram kumar <ra...@gmail.com> wrote:
> When you register a temp table from the dataframe
>
> eg:
> var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
> df_join.registerTempTable("test")
>
> sqlContext.sql("select * from test")
>
> +----+----+----+----+
>
> | id| A| id| B|
>
> +----+----+----+----+
>
> | 1| 0|null|null|
>
> | 2| 0| 2| 0|
>
> |null|null| 3| 0|
>
> +----+----+----+----+
>
>
> but, when you query the "id"
>
>
> sqlContext.sql("select id from test")
>
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>
> On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Look weird, seems spark-v1.5.x can accept the query.
>> What's the difference between the example and your query?
>>
>> ####
>>
>> Welcome to
>>
>> ____ __
>>
>> / __/__ ___ _____/ /__
>>
>> _\ \/ _ \/ _ `/ __/ '_/
>>
>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.2
>>
>> /_/
>>
>> scala> :paste
>>
>> // Entering paste mode (ctrl-D to finish)
>>
>> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
>>
>> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
>>
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>>
>> // Exiting paste mode, now interpreting.
>>
>>
>> +----+----+----+----+
>>
>> | id| A| id| B|
>>
>> +----+----+----+----+
>>
>> | 1| 0|null|null|
>>
>> | 2| 0| 2| 0|
>>
>> |null|null| 3| 0|
>>
>> +----+----+----+----+
>>
>>
>> df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
>>
>> df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
>>
>>
>>
>>
>>
>> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> I tried
>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>
>>> But there is a duplicate "id" and when I query the "id", I get
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>
>>> I am currently using spark 1.5.2.
>>> Is there any alternative way in 1.5
>>>
>>> Thanks
>>>
>>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <
>>> linguin.m.s@gmail.com> wrote:
>>>
>>>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>>>
>>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>>
>>>> // maropu
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> If I run as
>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>
>>>>> It takes as inner join.
>>>>>
>>>>>
>>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> pretty simple, a similar construct to tables projected as DF
>>>>>>
>>>>>> val c =
>>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>>>> val t =
>>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Try this one:
>>>>>>>
>>>>>>>
>>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bijay
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I tried to join two dataframe
>>>>>>>>
>>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>>>
>>>>>>>> df_join.registerTempTable("join_test")
>>>>>>>>
>>>>>>>>
>>>>>>>> When querying "Id" from "join_test"
>>>>>>>>
>>>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7
>>>>>>>> (state=,code=0)
>>>>>>>> 0: jdbc:hive2://>
>>>>>>>>
>>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
>>>>>>>> one "Id"
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
--
---
Takeshi Yamamuro
Re: Error joining dataframes
Posted by ram kumar <ra...@gmail.com>.
When you register a temp table from the dataframe
eg:
var df_join = df1.join(df2, df1("id") === df2("id"), "outer")
df_join.registerTempTable("test")
sqlContext.sql("select * from test")
+----+----+----+----+
| id| A| id| B|
+----+----+----+----+
| 1| 0|null|null|
| 2| 0| 2| 0|
|null|null| 3| 0|
+----+----+----+----+
but, when you query the "id"
sqlContext.sql("select id from test")
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
On Wed, May 18, 2016 at 12:44 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:
> Look weird, seems spark-v1.5.x can accept the query.
> What's the difference between the example and your query?
>
> ####
>
> Welcome to
>
> ____ __
>
> / __/__ ___ _____/ /__
>
> _\ \/ _ \/ _ `/ __/ '_/
>
> /___/ .__/\_,_/_/ /_/\_\ version 1.5.2
>
> /_/
>
> scala> :paste
>
> // Entering paste mode (ctrl-D to finish)
>
> val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
>
> val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
>
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
>
> // Exiting paste mode, now interpreting.
>
>
> +----+----+----+----+
>
> | id| A| id| B|
>
> +----+----+----+----+
>
> | 1| 0|null|null|
>
> | 2| 0| 2| 0|
>
> |null|null| 3| 0|
>
> +----+----+----+----+
>
>
> df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
>
> df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
>
>
>
>
>
> On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com>
> wrote:
>
>> I tried
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> But there is a duplicate "id" and when I query the "id", I get
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>
>> I am currently using spark 1.5.2.
>> Is there any alternative way in 1.5
>>
>> Thanks
>>
>> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin.m.s@gmail.com
>> > wrote:
>>
>>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>>
>>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>>
>>> // maropu
>>>
>>>
>>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>>> wrote:
>>>
>>>> If I run as
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> It takes as inner join.
>>>>
>>>>
>>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> pretty simple, a similar construct to tables projected as DF
>>>>>
>>>>> val c =
>>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>>> val t =
>>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Try this one:
>>>>>>
>>>>>>
>>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>>
>>>>>> Thanks,
>>>>>> Bijay
>>>>>>
>>>>>>
>>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I tried to join two dataframe
>>>>>>>
>>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>>
>>>>>>> df_join.registerTempTable("join_test")
>>>>>>>
>>>>>>>
>>>>>>> When querying "Id" from "join_test"
>>>>>>>
>>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>>> 0: jdbc:hive2://>
>>>>>>>
>>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
>>>>>>> one "Id"
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>
Re: Error joining dataframes
Posted by Takeshi Yamamuro <li...@gmail.com>.
Look weird, seems spark-v1.5.x can accept the query.
What's the difference between the example and your query?
####
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
scala> :paste
// Entering paste mode (ctrl-D to finish)
val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")
val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")
df1.join(df2, df1("id") === df2("id"), "outer").show
// Exiting paste mode, now interpreting.
+----+----+----+----+
| id| A| id| B|
+----+----+----+----+
| 1| 0|null|null|
| 2| 0| 2| 0|
|null|null| 3| 0|
+----+----+----+----+
df1: org.apache.spark.sql.DataFrame = [id: int, A: int]
df2: org.apache.spark.sql.DataFrame = [id: int, B: int]
On Wed, May 18, 2016 at 3:52 PM, ram kumar <ra...@gmail.com> wrote:
> I tried
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> But there is a duplicate "id" and when I query the "id", I get
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>
> I am currently using spark 1.5.2.
> Is there any alternative way in 1.5
>
> Thanks
>
> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>
>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> // maropu
>>
>>
>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> If I run as
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> It takes as inner join.
>>>
>>>
>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> pretty simple, a similar construct to tables projected as DF
>>>>
>>>> val c =
>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>> val t =
>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Try this one:
>>>>>
>>>>>
>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>
>>>>> Thanks,
>>>>> Bijay
>>>>>
>>>>>
>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried to join two dataframe
>>>>>>
>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>
>>>>>> df_join.registerTempTable("join_test")
>>>>>>
>>>>>>
>>>>>> When querying "Id" from "join_test"
>>>>>>
>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>> 0: jdbc:hive2://>
>>>>>>
>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>>> "Id"
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
--
---
Takeshi Yamamuro
Re: Error joining dataframes
Posted by ram kumar <ra...@gmail.com>.
I tried
df1.join(df2, df1("id") === df2("id"), "outer").show
But there is a duplicate "id" and when I query the "id", I get
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
I am currently using spark 1.5.2.
Is there any alternative way in 1.5
Thanks
On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:
> Also, you can pass the query that you'd like to use in spark-v1.6+;
>
> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> // maropu
>
>
> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com>
> wrote:
>
>> If I run as
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> It takes as inner join.
>>
>>
>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> pretty simple, a similar construct to tables projected as DF
>>>
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Try this one:
>>>>
>>>>
>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>
>>>> Thanks,
>>>> Bijay
>>>>
>>>>
>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to join two dataframe
>>>>>
>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>
>>>>> df_join.registerTempTable("join_test")
>>>>>
>>>>>
>>>>> When querying "Id" from "join_test"
>>>>>
>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>> 0: jdbc:hive2://>
>>>>>
>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>> "Id"
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>
Re: Error joining dataframes
Posted by Takeshi Yamamuro <li...@gmail.com>.
Also, you can pass the query that you'd like to use in spark-v1.6+;
val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
df1.join(df2, df1("id") === df2("id"), "outer").show
// maropu
On Wed, May 18, 2016 at 3:29 PM, ram kumar <ra...@gmail.com> wrote:
> If I run as
> val rs = s.join(t,"time_id").join(c,"channel_id")
>
> It takes as inner join.
>
>
> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> pretty simple, a similar construct to tables projected as DF
>>
>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>>
>>> Hi,
>>>
>>> Try this one:
>>>
>>>
>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>
>>> Thanks,
>>> Bijay
>>>
>>>
>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried to join two dataframe
>>>>
>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>
>>>> df_join.registerTempTable("join_test")
>>>>
>>>>
>>>> When querying "Id" from "join_test"
>>>>
>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>> 0: jdbc:hive2://>
>>>>
>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>> "Id"
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>
--
---
Takeshi Yamamuro
Re: Error joining dataframes
Posted by ram kumar <ra...@gmail.com>.
If I run as
val rs = s.join(t,"time_id").join(c,"channel_id")
It takes as inner join.
On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:
> pretty simple, a similar construct to tables projected as DF
>
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
> val rs = s.join(t,"time_id").join(c,"channel_id")
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
>
>> Hi,
>>
>> Try this one:
>>
>>
>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>
>> Thanks,
>> Bijay
>>
>>
>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried to join two dataframe
>>>
>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>
>>> df_join.registerTempTable("join_test")
>>>
>>>
>>> When querying "Id" from "join_test"
>>>
>>> 0: jdbc:hive2://> *select Id from join_test;*
>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>> 0: jdbc:hive2://>
>>>
>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>> "Id"
>>>
>>> Thanks
>>>
>>
>>
>
Re: Error joining dataframes
Posted by Mich Talebzadeh <mi...@gmail.com>.
pretty simple, a similar construct to tables projected as DF
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
val rs = s.join(t,"time_id").join(c,"channel_id")
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 17 May 2016 at 21:52, Bijay Kumar Pathak <bk...@mtu.edu> wrote:
> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>
Re: Error joining dataframes
Posted by Bijay Kumar Pathak <bk...@mtu.edu>.
Hi,
Try this one:
df_join = df1.*join*(df2, 'Id', "fullouter")
Thanks,
Bijay
On Tue, May 17, 2016 at 9:39 AM, ram kumar <ra...@gmail.com> wrote:
> Hi,
>
> I tried to join two dataframe
>
> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>
> df_join.registerTempTable("join_test")
>
>
> When querying "Id" from "join_test"
>
> 0: jdbc:hive2://> *select Id from join_test;*
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
> 0: jdbc:hive2://>
>
> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>
> Thanks
>