You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by muhammet pakyürek <mp...@hotmail.com> on 2016/09/26 07:30:51 UTC

how to find NaN values of each row of spark dataframe to decide whether the rows is dropeed or not

is there any way to do this directly.  if its not, is there any todo this indirectly using another datastrcutures of spark


Re: how to find NaN values of each row of spark dataframe to decide whether the rows is dropeed or not

Posted by Peyman Mohajerian <mo...@gmail.com>.
Also take a look at this API:

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameNaFunctions

On Mon, Sep 26, 2016 at 1:09 AM, Bedrytski Aliaksandr <sp...@bedryt.ski>
wrote:

> Hi Muhammet,
>
> python also supports sql queries http://spark.apache.org/docs/latest/sql-
> programming-guide.html#running-sql-queries-programmatically
>
> Regards,
> --
>   Bedrytski Aliaksandr
>   spark@bedryt.ski
>
>
>
> On Mon, Sep 26, 2016, at 10:01, muhammet pakyürek wrote:
>
>
>
>
> but my requst is related to python because i have designed preprocess
>  for data which looks for rows including NaN values. if the number of Nan
> is high above the threshodl. it s deleted otherwise fill it with a
> predictive value. therefore i need python version for this process
>
>
> ------------------------------
>
> *From:* Bedrytski Aliaksandr <sp...@bedryt.ski>
> *Sent:* Monday, September 26, 2016 7:53 AM
> *To:* muhammet pakyürek
> *Cc:* user@spark.apache.org
> *Subject:* Re: how to find NaN values of each row of spark dataframe to
> decide whether the rows is dropeed or not
>
> Hi Muhammet,
>
> have you tried to use sql queries?
>
> spark.sql("""
>     SELECT
>         field1,
>         field2,
>         field3
>    FROM table1
>    WHERE
>         field1 != 'Nan',
>         field2 != 'Nan',
>         field3 != 'Nan'
> """)
>
>
> This query filters rows containing Nan for a table with 3 columns.
>
> Regards,
> --
>   Bedrytski Aliaksandr
>   spark@bedryt.ski
>
>
>
> On Mon, Sep 26, 2016, at 09:30, muhammet pakyürek wrote:
>
>
> is there any way to do this directly.  if its not, is there any todo this
> indirectly using another datastrcutures of spark
>
>
>
>

Re: how to find NaN values of each row of spark dataframe to decide whether the rows is dropeed or not

Posted by Bedrytski Aliaksandr <sp...@bedryt.ski>.
Hi Muhammet,

python also supports sql queries
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-queries-programmatically

Regards,
--
  Bedrytski Aliaksandr
  spark@bedryt.ski



On Mon, Sep 26, 2016, at 10:01, muhammet pakyürek wrote:
>
>
>
> but my requst is related to python because i have designed preprocess
> for data which looks for rows including NaN values. if the number of
> Nan is high above the threshodl. it s deleted otherwise fill it with a
> predictive value. therefore i need python version for this process
>
>
>
> *From:* Bedrytski Aliaksandr <sp...@bedryt.ski> *Sent:* Monday,
> September 26, 2016 7:53 AM *To:* muhammet pakyürek *Cc:*
> user@spark.apache.org *Subject:* Re: how to find NaN values of each
> row of spark dataframe to decide whether the rows is dropeed or not
>
> Hi Muhammet,
>
> have you tried to use sql queries?
>
>> spark.sql("""
>>     SELECT
>>         field1,
>>         field2,
>>         field3
>>    FROM table1
>>    WHERE
>>         field1 != 'Nan',
>>         field2 != 'Nan',
>>         field3 != 'Nan'
>> """)
>
> This query filters rows containing Nan for a table with 3 columns.
>
> Regards,
> --
>   Bedrytski Aliaksandr
>   spark@bedryt.ski
>
>
>
> On Mon, Sep 26, 2016, at 09:30, muhammet pakyürek wrote:
>>
>> is there any way to do this directly.  if its not, is there any todo
>> this indirectly using another datastrcutures of spark
>>
>

Re: how to find NaN values of each row of spark dataframe to decide whether the rows is dropeed or not

Posted by Bedrytski Aliaksandr <sp...@bedryt.ski>.
Hi Muhammet,

have you tried to use sql queries?

> spark.sql("""
>     SELECT
>         field1,
>         field2,
>         field3
>    FROM table1
>    WHERE
>         field1 != 'Nan',
>         field2 != 'Nan',
>         field3 != 'Nan'
> """)

This query filters rows containing Nan for a table with 3 columns.

Regards,
--
  Bedrytski Aliaksandr
  spark@bedryt.ski



On Mon, Sep 26, 2016, at 09:30, muhammet pakyürek wrote:
>
> is there any way to do this directly.  if its not, is there any todo
> this indirectly using another datastrcutures of spark
>