You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by unk1102 <um...@gmail.com> on 2015/10/06 17:51:37 UTC

ORC files created by Spark job can't be accessed using hive table

Hi I have a spark job which creates ORC files in partitions using the
following code 

dataFrame.write().mode(SaveMode.Append).partitionBy("entity","date").format("orc").save("baseTable");

Above code creates successfully orc files which is readable in Spark
dataframe 

But when I try to load orc files generated using above code into hive orc
table or hive external table nothing gets printed looks like table is empty
what's wrong here I can see orc files in hdfs but hive table does not read
it please guide 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ORC-files-created-by-Spark-job-can-t-be-accessed-using-hive-table-tp24954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: ORC files created by Spark job can't be accessed using hive table

Posted by Umesh Kacha <um...@gmail.com>.

Thanks Michael so the following code written using Spark 1.5.1 should be
able to recognise by Hive table right

dataFrame.write().mode(SaveMode.Append).partitionBy("
entity","date").format("orc").save("baseTable");

Hive console:
Create external table bla bla
stored as ORC
Location '/user/xyz/baseTable'

On Tue, Oct 6, 2015 at 10:54 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> I believe this is fixed in Spark 1.5.1 as long as the table is only using
> types that hive understands and is not partitioned.  The problem with
> partitioned tables it that hive does not support dynamic discovery unless
> you manually run the repair command.
>
> On Tue, Oct 6, 2015 at 9:33 AM, Umesh Kacha <um...@gmail.com> wrote:
>
>> Hi Ted thanks I know I solved that by using dataframe for both reading
>> and writing. I am running into different problem now if spark can read hive
>> orc files why can't hive read orc files created by Spark?
>> On Oct 6, 2015 9:28 PM, "Ted Yu" <yu...@gmail.com> wrote:
>>
>>> See this thread:
>>> http://search-hadoop.com/m/q3RTtwwjNxXvPEe1
>>>
>>> A brief search in Spark JIRAs didn't find anything opened on this
>>> subject.
>>>
>>> On Tue, Oct 6, 2015 at 8:51 AM, unk1102 <um...@gmail.com> wrote:
>>>
>>>> Hi I have a spark job which creates ORC files in partitions using the
>>>> following code
>>>>
>>>>
>>>> dataFrame.write().mode(SaveMode.Append).partitionBy("entity","date").format("orc").save("baseTable");
>>>>
>>>> Above code creates successfully orc files which is readable in Spark
>>>> dataframe
>>>>
>>>> But when I try to load orc files generated using above code into hive
>>>> orc
>>>> table or hive external table nothing gets printed looks like table is
>>>> empty
>>>> what's wrong here I can see orc files in hdfs but hive table does not
>>>> read
>>>> it please guide
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/ORC-files-created-by-Spark-job-can-t-be-accessed-using-hive-table-tp24954.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>

Re: ORC files created by Spark job can't be accessed using hive table

Posted by Michael Armbrust <mi...@databricks.com>.

I believe this is fixed in Spark 1.5.1 as long as the table is only using
types that hive understands and is not partitioned.  The problem with
partitioned tables it that hive does not support dynamic discovery unless
you manually run the repair command.

On Tue, Oct 6, 2015 at 9:33 AM, Umesh Kacha <um...@gmail.com> wrote:

> Hi Ted thanks I know I solved that by using dataframe for both reading and
> writing. I am running into different problem now if spark can read hive orc
> files why can't hive read orc files created by Spark?
> On Oct 6, 2015 9:28 PM, "Ted Yu" <yu...@gmail.com> wrote:
>
>> See this thread:
>> http://search-hadoop.com/m/q3RTtwwjNxXvPEe1
>>
>> A brief search in Spark JIRAs didn't find anything opened on this subject.
>>
>> On Tue, Oct 6, 2015 at 8:51 AM, unk1102 <um...@gmail.com> wrote:
>>
>>> Hi I have a spark job which creates ORC files in partitions using the
>>> following code
>>>
>>>
>>> dataFrame.write().mode(SaveMode.Append).partitionBy("entity","date").format("orc").save("baseTable");
>>>
>>> Above code creates successfully orc files which is readable in Spark
>>> dataframe
>>>
>>> But when I try to load orc files generated using above code into hive orc
>>> table or hive external table nothing gets printed looks like table is
>>> empty
>>> what's wrong here I can see orc files in hdfs but hive table does not
>>> read
>>> it please guide
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/ORC-files-created-by-Spark-job-can-t-be-accessed-using-hive-table-tp24954.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>

Re: ORC files created by Spark job can't be accessed using hive table

Posted by Umesh Kacha <um...@gmail.com>.

Hi Ted thanks I know I solved that by using dataframe for both reading and
writing. I am running into different problem now if spark can read hive orc
files why can't hive read orc files created by Spark?
On Oct 6, 2015 9:28 PM, "Ted Yu" <yu...@gmail.com> wrote:

> See this thread:
> http://search-hadoop.com/m/q3RTtwwjNxXvPEe1
>
> A brief search in Spark JIRAs didn't find anything opened on this subject.
>
> On Tue, Oct 6, 2015 at 8:51 AM, unk1102 <um...@gmail.com> wrote:
>
>> Hi I have a spark job which creates ORC files in partitions using the
>> following code
>>
>>
>> dataFrame.write().mode(SaveMode.Append).partitionBy("entity","date").format("orc").save("baseTable");
>>
>> Above code creates successfully orc files which is readable in Spark
>> dataframe
>>
>> But when I try to load orc files generated using above code into hive orc
>> table or hive external table nothing gets printed looks like table is
>> empty
>> what's wrong here I can see orc files in hdfs but hive table does not read
>> it please guide
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/ORC-files-created-by-Spark-job-can-t-be-accessed-using-hive-table-tp24954.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: ORC files created by Spark job can't be accessed using hive table

Posted by Ted Yu <yu...@gmail.com>.

See this thread:
http://search-hadoop.com/m/q3RTtwwjNxXvPEe1

A brief search in Spark JIRAs didn't find anything opened on this subject.

On Tue, Oct 6, 2015 at 8:51 AM, unk1102 <um...@gmail.com> wrote:

> Hi I have a spark job which creates ORC files in partitions using the
> following code
>
>
> dataFrame.write().mode(SaveMode.Append).partitionBy("entity","date").format("orc").save("baseTable");
>
> Above code creates successfully orc files which is readable in Spark
> dataframe
>
> But when I try to load orc files generated using above code into hive orc
> table or hive external table nothing gets printed looks like table is empty
> what's wrong here I can see orc files in hdfs but hive table does not read
> it please guide
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/ORC-files-created-by-Spark-job-can-t-be-accessed-using-hive-table-tp24954.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>