You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerrick Hoang <je...@gmail.com> on 2015/07/13 02:03:08 UTC

SparkSQL 'describe table' tries to look at all records

Hi all,

I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead of just giving me the metadata of the table. Would
appreciate if someone can give me some pointers, thanks!

Re: SparkSQL 'describe table' tries to look at all records

Posted by Ted Yu <yu...@gmail.com>.
Which Spark release do you use ?

Cheers

On Sun, Jul 12, 2015 at 5:03 PM, Jerrick Hoang <je...@gmail.com>
wrote:

> Hi all,
>
> I'm new to Spark and this question may be trivial or has already been
> answered, but when I do a 'describe table' from SparkSQL CLI it seems to
> try looking at all records at the table (which takes a really long time for
> big table) instead of just giving me the metadata of the table. Would
> appreciate if someone can give me some pointers, thanks!
>

Re: SparkSQL 'describe table' tries to look at all records

Posted by Yana Kadiyska <ya...@gmail.com>.
Have you seen https://issues.apache.org/jira/browse/SPARK-6910....I opened
https://issues.apache.org/jira/browse/SPARK-6984 which I think is related
to this as well. There are a bunch of issues attached to it but basically
yes, Spark interactions with a large metastore are bad...very bad if your
metastore is large.

On Sun, Jul 12, 2015 at 11:39 PM, Jerrick Hoang <je...@gmail.com>
wrote:

> Sorry all for not being clear. I'm using spark 1.4 and the table is a hive
> table, and the table is partitioned.
>
> On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai <yh...@databricks.com> wrote:
>
>> Jerrick,
>>
>> Let me ask a few clarification questions. What is the version of Spark?
>> Is the table a hive table? What is the format of the table? Is the table
>> partitioned?
>>
>> Thanks,
>>
>> Yin
>>
>> On Sun, Jul 12, 2015 at 6:01 PM, ayan guha <gu...@gmail.com> wrote:
>>
>>> Describe computes statistics, so it will try to query the table. The one
>>> you are looking for is df.printSchema()
>>>
>>> On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang <je...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm new to Spark and this question may be trivial or has already been
>>>> answered, but when I do a 'describe table' from SparkSQL CLI it seems to
>>>> try looking at all records at the table (which takes a really long time for
>>>> big table) instead of just giving me the metadata of the table. Would
>>>> appreciate if someone can give me some pointers, thanks!
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Re: SparkSQL 'describe table' tries to look at all records

Posted by Jerrick Hoang <je...@gmail.com>.
Sorry all for not being clear. I'm using spark 1.4 and the table is a hive
table, and the table is partitioned.

On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai <yh...@databricks.com> wrote:

> Jerrick,
>
> Let me ask a few clarification questions. What is the version of Spark? Is
> the table a hive table? What is the format of the table? Is the table
> partitioned?
>
> Thanks,
>
> Yin
>
> On Sun, Jul 12, 2015 at 6:01 PM, ayan guha <gu...@gmail.com> wrote:
>
>> Describe computes statistics, so it will try to query the table. The one
>> you are looking for is df.printSchema()
>>
>> On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang <je...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm new to Spark and this question may be trivial or has already been
>>> answered, but when I do a 'describe table' from SparkSQL CLI it seems to
>>> try looking at all records at the table (which takes a really long time for
>>> big table) instead of just giving me the metadata of the table. Would
>>> appreciate if someone can give me some pointers, thanks!
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Re: SparkSQL 'describe table' tries to look at all records

Posted by Yin Huai <yh...@databricks.com>.
Jerrick,

Let me ask a few clarification questions. What is the version of Spark? Is
the table a hive table? What is the format of the table? Is the table
partitioned?

Thanks,

Yin

On Sun, Jul 12, 2015 at 6:01 PM, ayan guha <gu...@gmail.com> wrote:

> Describe computes statistics, so it will try to query the table. The one
> you are looking for is df.printSchema()
>
> On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang <je...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm new to Spark and this question may be trivial or has already been
>> answered, but when I do a 'describe table' from SparkSQL CLI it seems to
>> try looking at all records at the table (which takes a really long time for
>> big table) instead of just giving me the metadata of the table. Would
>> appreciate if someone can give me some pointers, thanks!
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: SparkSQL 'describe table' tries to look at all records

Posted by ayan guha <gu...@gmail.com>.
Describe computes statistics, so it will try to query the table. The one
you are looking for is df.printSchema()

On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang <je...@gmail.com>
wrote:

> Hi all,
>
> I'm new to Spark and this question may be trivial or has already been
> answered, but when I do a 'describe table' from SparkSQL CLI it seems to
> try looking at all records at the table (which takes a really long time for
> big table) instead of just giving me the metadata of the table. Would
> appreciate if someone can give me some pointers, thanks!
>



-- 
Best Regards,
Ayan Guha