You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ahmed Nawar <ah...@gmail.com> on 2015/03/23 09:48:54 UTC
Data/File structure Validation
Dears,
Is there any way to validate the CSV, Json ... Files while loading to
DataFrame.
I need to ignore corrupted rows.(Rows with not matching with the
schema).
Thanks,
Ahmed Nawwar
Re: Data/File structure Validation
Posted by Ahmed Nawar <ah...@gmail.com>.
Dear Taotao,
Yes, I tried sparkCSV.
Thanks,
Nawwar
On Mon, Mar 23, 2015 at 12:20 PM, Taotao.Li <ta...@datayes.com> wrote:
> can it load successfully if the format is invalid?
>
> ------------------------------
> *发件人: *"Ahmed Nawar" <ah...@gmail.com>
> *收件人: *user@spark.apache.org
> *发送时间: *星期一, 2015年 3 月 23日 下午 4:48:54
> *主题: *Data/File structure Validation
>
> Dears,
>
> Is there any way to validate the CSV, Json ... Files while loading to
> DataFrame.
> I need to ignore corrupted rows.(Rows with not matching with the
> schema).
>
>
> Thanks,
> Ahmed Nawwar
>
>
>
> --
>
>
> *---------------------------------------------------------------------------*
>
> *Thanks & Best regards*
>
> 李涛涛 Taotao · Li | Fixed Income@Datayes | Software Engineer
>
> 地址:上海市浦东新区陆家嘴西路99号万向大厦8楼, 200120
> Address :Wanxiang Towen 8F, Lujiazui West Rd. No.99, Pudong New District,
> Shanghai, 200120
>
> 电话|Phone:021-60216502 手机|Mobile: +86-18202171279
>
>
Re: Data/File structure Validation
Posted by "Taotao.Li" <ta...@datayes.com>.
can it load successfully if the format is invalid?
----- 原始邮件 -----
发件人: "Ahmed Nawar" <ah...@gmail.com>
收件人: user@spark.apache.org
发送时间: 星期一, 2015年 3 月 23日 下午 4:48:54
主题: Data/File structure Validation
Dears,
Is there any way to validate the CSV, Json ... Files while loading to DataFrame.
I need to ignore corrupted rows.(Rows with not matching with the schema).
Thanks,
Ahmed Nawwar
--
---------------------------------------------------------------------------
Thanks & Best regards
李涛涛 Taotao · Li | Fixed Income@Datayes | Software Engineer
地址:上海市浦东新区陆家嘴西路 99 号万向大厦8 楼, 200120
Address :Wanxiang Towen 8 F, Lujiazui West Rd. No.99, Pudong New District, Shanghai, 200120
电话 |Phone : 021-60216502 手机 |Mobile: +86-18202171279
Re: Data/File structure Validation
Posted by Ahmed Nawar <ah...@gmail.com>.
Dear Raunak,
Source system provided logs with some errors. I need to make sure each
row is in correct format (number of columns/ attributes and data types is
correct) and move incorrect Rows to separated List.
Of course i can do my logic but i need to make sure there is no direct way.
Thanks,
Nawwar
On Mon, Mar 23, 2015 at 1:14 PM, Raunak Jhawar <ra...@gmail.com>
wrote:
> CSV is a structured format and JSON is not (semi structured). It is
> obvious for different JSON documents to have differing schema? What are you
> trying to do here?
>
> --
> Thanks,
> Raunak Jhawar
> m: 09820890034
>
>
>
>
>
>
> On Mon, Mar 23, 2015 at 2:18 PM, Ahmed Nawar <ah...@gmail.com>
> wrote:
>
>> Dears,
>>
>> Is there any way to validate the CSV, Json ... Files while loading to
>> DataFrame.
>> I need to ignore corrupted rows.(Rows with not matching with the
>> schema).
>>
>>
>> Thanks,
>> Ahmed Nawwar
>>
>
>