You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ahmed Nawar <ah...@gmail.com> on 2015/03/23 09:48:54 UTC

Data/File structure Validation

Dears,

    Is there any way to validate the CSV, Json ... Files while loading to
DataFrame.
    I need to ignore corrupted rows.(Rows with not matching with the
schema).


Thanks,
Ahmed Nawwar

Re: Data/File structure Validation

Posted by Ahmed Nawar <ah...@gmail.com>.
Dear Taotao,

    Yes, I tried sparkCSV.


Thanks,
Nawwar


On Mon, Mar 23, 2015 at 12:20 PM, Taotao.Li <ta...@datayes.com> wrote:

> can it load successfully if the format is invalid?
>
> ------------------------------
> *发件人: *"Ahmed Nawar" <ah...@gmail.com>
> *收件人: *user@spark.apache.org
> *发送时间: *星期一, 2015年 3 月 23日 下午 4:48:54
> *主题: *Data/File structure Validation
>
> Dears,
>
>     Is there any way to validate the CSV, Json ... Files while loading to
> DataFrame.
>     I need to ignore corrupted rows.(Rows with not matching with the
> schema).
>
>
> Thanks,
> Ahmed Nawwar
>
>
>
> --
>
>
> *---------------------------------------------------------------------------*
>
> *Thanks & Best regards*
>
> 李涛涛 Taotao · Li  |  Fixed Income@Datayes  |  Software Engineer
>
> 地址:上海市浦东新区陆家嘴西路99号万向大厦8楼, 200120
> Address :Wanxiang Towen 8F, Lujiazui West Rd. No.99, Pudong New District,
> Shanghai, 200120
>
> 电话|Phone:021-60216502      手机|Mobile: +86-18202171279
>
>

Re: Data/File structure Validation

Posted by "Taotao.Li" <ta...@datayes.com>.
can it load successfully if the format is invalid? 

----- 原始邮件 -----

发件人: "Ahmed Nawar" <ah...@gmail.com> 
收件人: user@spark.apache.org 
发送时间: 星期一, 2015年 3 月 23日 下午 4:48:54 
主题: Data/File structure Validation 

Dears, 

Is there any way to validate the CSV, Json ... Files while loading to DataFrame. 
I need to ignore corrupted rows.(Rows with not matching with the schema). 


Thanks, 
Ahmed Nawwar 



-- 


--------------------------------------------------------------------------- 

Thanks & Best regards 

李涛涛 Taotao · Li | Fixed Income@Datayes | Software Engineer 

地址:上海市浦东新区陆家嘴西路 99 号万向大厦8 楼, 200120 
Address :Wanxiang Towen 8 F, Lujiazui West Rd. No.99, Pudong New District, Shanghai, 200120 

电话 |Phone : 021-60216502 手机 |Mobile: +86-18202171279 


Re: Data/File structure Validation

Posted by Ahmed Nawar <ah...@gmail.com>.
Dear Raunak,

   Source system provided logs with some errors. I need to make sure each
row is in correct format (number of columns/ attributes and data types is
correct) and move incorrect Rows to separated List.

Of course i can do my logic but i need to make sure there is no direct way.


Thanks,
Nawwar


On Mon, Mar 23, 2015 at 1:14 PM, Raunak Jhawar <ra...@gmail.com>
wrote:

> CSV is a structured format and JSON is not (semi structured). It is
> obvious for different JSON documents to have differing schema? What are you
> trying to do here?
>
> --
> Thanks,
> Raunak Jhawar
> m: 09820890034
>
>
>
>
>
>
> On Mon, Mar 23, 2015 at 2:18 PM, Ahmed Nawar <ah...@gmail.com>
> wrote:
>
>> Dears,
>>
>>     Is there any way to validate the CSV, Json ... Files while loading to
>> DataFrame.
>>     I need to ignore corrupted rows.(Rows with not matching with the
>> schema).
>>
>>
>> Thanks,
>> Ahmed Nawwar
>>
>
>