You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by bi...@bitfox.top on 2021/12/24 02:04:00 UTC

Dataframe's storage size

Hello

Is it possible to know a dataframe's total storage size in bytes? such 
as:

>>> df.size()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in 
__getattr__
     "'%s' object has no attribute '%s'" % (self.__class__.__name__, 
name))
AttributeError: 'DataFrame' object has no attribute 'size'

Sure it won't work. but if there is such a method that would be great.

Thanks.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Dataframe's storage size

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

even the cached data has different memory for the dataframes with exactly
the same data depending on a lot of conditions.

I generally tend to try to understand the problem before jumping into
conclusions through assumptions, sadly a habit I cannot overcome.

Is there a way to understand what is the person trying to achieve here by
knowing the size of dataframe?



Regards,
Gourav

On Fri, Dec 24, 2021 at 2:49 PM Sean Owen <sr...@gmail.com> wrote:

> I assume it means size in memory when cached, which does make sense.
> Fastest thing is to look at it in the UI Storage tab after it is cached.
>
> On Fri, Dec 24, 2021, 4:54 AM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> This question, once again like the last one, does not make much sense at
>> all. Where are you trying to store the data frame, and how?
>>
>> Are you just trying to write a blog, as you were mentioning in an earlier
>> email, and trying to fill in some gaps? I think that the questions are
>> entirely wrong.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Fri, Dec 24, 2021 at 2:04 AM <bi...@bitfox.top> wrote:
>>
>>> Hello
>>>
>>> Is it possible to know a dataframe's total storage size in bytes? such
>>> as:
>>>
>>> >>> df.size()
>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>>    File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in
>>> __getattr__
>>>      "'%s' object has no attribute '%s'" % (self.__class__.__name__,
>>> name))
>>> AttributeError: 'DataFrame' object has no attribute 'size'
>>>
>>> Sure it won't work. but if there is such a method that would be great.
>>>
>>> Thanks.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: Dataframe's storage size

Posted by Sean Owen <sr...@gmail.com>.
I assume it means size in memory when cached, which does make sense.
Fastest thing is to look at it in the UI Storage tab after it is cached.

On Fri, Dec 24, 2021, 4:54 AM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> This question, once again like the last one, does not make much sense at
> all. Where are you trying to store the data frame, and how?
>
> Are you just trying to write a blog, as you were mentioning in an earlier
> email, and trying to fill in some gaps? I think that the questions are
> entirely wrong.
>
> Regards,
> Gourav Sengupta
>
> On Fri, Dec 24, 2021 at 2:04 AM <bi...@bitfox.top> wrote:
>
>> Hello
>>
>> Is it possible to know a dataframe's total storage size in bytes? such
>> as:
>>
>> >>> df.size()
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>    File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in
>> __getattr__
>>      "'%s' object has no attribute '%s'" % (self.__class__.__name__,
>> name))
>> AttributeError: 'DataFrame' object has no attribute 'size'
>>
>> Sure it won't work. but if there is such a method that would be great.
>>
>> Thanks.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Dataframe's storage size

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

This question, once again like the last one, does not make much sense at
all. Where are you trying to store the data frame, and how?

Are you just trying to write a blog, as you were mentioning in an earlier
email, and trying to fill in some gaps? I think that the questions are
entirely wrong.

Regards,
Gourav Sengupta

On Fri, Dec 24, 2021 at 2:04 AM <bi...@bitfox.top> wrote:

> Hello
>
> Is it possible to know a dataframe's total storage size in bytes? such
> as:
>
> >>> df.size()
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
>    File "/opt/spark/python/pyspark/sql/dataframe.py", line 1660, in
> __getattr__
>      "'%s' object has no attribute '%s'" % (self.__class__.__name__,
> name))
> AttributeError: 'DataFrame' object has no attribute 'size'
>
> Sure it won't work. but if there is such a method that would be great.
>
> Thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>