You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Harmanat Singh <wi...@gmail.com> on 2020/06/23 07:04:51 UTC

apache-spark mongodb dataframe issue

Hi

Please look at my issue from the link below.
https://stackoverflow.com/questions/62526118/how-to-differentiate-between-null-and-missing-mongogdb-values-in-a-spark-datafra


Kindly Help


Best
Mannat

Re: apache-spark mongodb dataframe issue

Posted by Mannat Singh <wi...@gmail.com>.

Hi Jeff
Thanks for confirming the same.

I have also thought about reading every MongoDB document separately along
with their schemas and then comparing them to the schemas of all the
documents in the collection. For our huge database this is a horrible
horrible approach as you have already mentioned.

I am doing RnD on another approach, will post here if there is a
breakthrough.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: apache-spark mongodb dataframe issue

Posted by Jeff Evans <je...@gmail.com>.

As far as I know, in general, there isn't a way to distinguish explicit
null values from missing ones.  (Someone please correct me if I'm wrong,
since I would love to be able to do this for my own reasons).  If you
really must do it, and don't care about performance at all (since it will
be horrible), read each object as a separate batch, while inferring the
schema.  If the schema contains the column, but the value is null, you will
know it was explicitly set that way.  If the schema doesn't contain the
column, you'll know it was missing.

On Tue, Jun 23, 2020 at 7:34 AM Harmanat Singh <wi...@gmail.com>
wrote:

> Hi
>
> Please look at my issue from the link below.
>
> https://stackoverflow.com/questions/62526118/how-to-differentiate-between-null-and-missing-mongogdb-values-in-a-spark-datafra
>
>
> Kindly Help
>
>
> Best
> Mannat
>