You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by po...@gmx.com on 2022/07/08 17:48:13 UTC

Does Table API connector, csv, has some option to ignore some columns

Does Table API connector, CSV, has some option to ignore some columns in
source file?

For instance read only first, second, nine... but not the others?



Or any other trick?

    
    
    CREATE TABLE some_table (
      some_id BIGINT,
      ...
    ) WITH (
     'format' = 'csv',
     ...
    )
    








Re: Does Table API connector, csv, has some option to ignore some columns

Posted by po...@gmx.com.
Could this not be as it was with readCsvFile and the "includeFields" option?
That would be nice

CSV is just a text file and headers are not required (but can be for human).





**Sent:**  Tuesday, July 12, 2022 at 2:48 PM  
**From:**  "yuxia" <lu...@alumni.sjtu.edu.cn>  
**To:**  "podunk" <po...@gmx.com>  
**Cc:**  "User" <us...@flink.apache.org>  
**Subject:**  Re: Does Table API connector, csv, has some option to ignore
some columns

For Json format,  you only need to define the parital columns to be selected
in Flink  DDL.

But for csv format, it's not supported. In csv file, if there's no header, how
can you mapping the  incomplete columns defined in Flink DDL to the origin
fields in the csv file? Thus, you need to write the all columns so that we can
do the mapping. If there's a header, we can do the mapping, and it should meet
your requirement. However, the current implementation haven't consider such
case.







Best regards,  
Yuxia



* * *

**发 件人: **"podunk" <po...@gmx.com>  
**收 件人: **"User" <us...@flink.apache.org>  
**发 送时间: **星期二, 2022年 7 月 12日 下午 5:13:05  
**主 题: **Re: Re: Does Table API connector, csv, has some option to ignore some
columns



This is really surprising.

When you import data from a file, you really rarely need to import everything
from that file. Most often it is several columns.

So the program that reads the file should be able to do this - this is the ABC
of working with data.



Often the suggestion is "you can write your script". Sure. I can. I can write
the entire program here - from scratch.

But I use a ready-made program to avoid writing my scripts.





**Sent:**  Tuesday, July 12, 2022 at 12:24 AM  
**From:**  "Alexander Fedulov" <al...@ververica.com>  
**To:**  podunk@gmx.com  
**Cc:**  "user" <us...@flink.apache.org>  
**Subject:**  Re: Re: Does Table API connector, csv, has some option to ignore
some columns

Hi podunk,



no, this is currently not possible:  
> Currently, the CSV schema is derived from table schema. [1]



So the Table schema is used to define how Jackson CSV parses the lines and
hence needs to be complete.



[1] <https://nightlies.apache.org/flink/flink-docs-
master/docs/connectors/table/formats/csv/>



Best,

Alexander Fedulov



On Mon, Jul 11, 2022 at 5:43 PM <[podunk@gmx.com](mailto:podunk@gmx.com)>
wrote:

> No, I did not mean.
>
> I said 'Does Table API connector, CSV, has some option to ignore some
> columns in source file?'
>
>  
>
>  
>
> **Sent:**  Monday, July 11, 2022 at 5:28 PM  
>  **From:**  "Xuyang" <[xyzhong131@163.com](mailto:xyzhong131@163.com)>  
>  **To:**  [podunk@gmx.com](mailto:podunk@gmx.com)  
>  **Cc:**  [user@flink.apache.org](mailto:user@flink.apache.org)  
>  **Subject:**  Re:Re: Does Table API connector, csv, has some option to
> ignore some columns
>
> Hi, did you mean `insert into table1 select col1, col2, col3 ... from
> table2`?
>
>  
>
> If this doesn't meet your requirement, what about using UDF to custom what
> you want in runtime.
>
>  
>
> \--
>
>     Best!
>
>     Xuyang
>
>  
>
>  
>
>  
>
> 在 2022-07-11 16:10:00,[podunk@gmx.com](mailto:podunk@gmx.com) 写道:
>

>> I want to control what I insert in table not what I get from table.

>>

>>  
>>

>>  
>>

>> **Sent:**  Monday, July 11, 2022 at 3:37 AM  
>  **From:**  "Shengkai Fang" <[fskmine@gmail.com](mailto:fskmine@gmail.com)>  
>  **To:**  [podunk@gmx.com](mailto:podunk@gmx.com)  
>  **Cc:**  "user" <[user@flink.apache.org](mailto:user@flink.apache.org)>  
>  **Subject:**  Re: Does Table API connector, csv, has some option to ignore
> some columns
>>

>> Hi.

>>

>>  
>>

>> In Flink SQL, you can select the column that you wants in the query. For
example, you can use

>>

>>  
>>

>> ```

>>

>> SELECT col_a, col_b FROM some_table;

>>

>> ```

>>

>>  
>>

>> Best,

>>

>> Shengkai

>>

>>  
>>

>>  
>>

>> <[podunk@gmx.com](mailto:podunk@gmx.com)> 于2022年7月9日周六 01:48写道:

>>

>>> Does Table API connector, CSV, has some option to ignore some columns in
source file?

>>>

>>> For instance read only first, second, nine... but not the others?

>>>

>>>  
>>>

>>> Or any other trick?

>>>  
>>>  
>>>     CREATE TABLE some_table (

>>>       some_id BIGINT,

>>>       ...

>>>     ) WITH (

>>>      'format' = 'csv',

>>>      ...

>>>     )

>>>  
>>>

>>>  
>>>

>>>  
>>>

>>>  


Re: Does Table API connector, csv, has some option to ignore some columns

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
For Json format, you only need to define the parital columns to be selected in Flink DDL. 
But for csv format, it's not supported. In csv file, if there's no header, how can you mapping the incomplete columns defined in Flink DDL to the origin fields in the csv file? Thus, you need to write the all columns so that we can do the mapping. If there's a header, we can do the mapping, and it should meet your requirement. However, the current implementation haven't consider such case. 



Best regards, 
Yuxia 


发件人: "podunk" <po...@gmx.com> 
收件人: "User" <us...@flink.apache.org> 
发送时间: 星期二, 2022年 7 月 12日 下午 5:13:05 
主题: Re: Re: Does Table API connector, csv, has some option to ignore some columns 

This is really surprising. 
When you import data from a file, you really rarely need to import everything from that file. Most often it is several columns. 
So the program that reads the file should be able to do this - this is the ABC of working with data. 
Often the suggestion is "you can write your script". Sure. I can. I can write the entire program here - from scratch. 
But I use a ready-made program to avoid writing my scripts. 
Sent: Tuesday, July 12, 2022 at 12:24 AM 
From: "Alexander Fedulov" <al...@ververica.com> 
To: podunk@gmx.com 
Cc: "user" <us...@flink.apache.org> 
Subject: Re: Re: Does Table API connector, csv, has some option to ignore some columns 
Hi podunk, 
no, this is currently not possible: 
> Currently, the CSV schema is derived from table schema. [1] 
So the Table schema is used to define how Jackson CSV parses the lines and hence needs to be complete. 
[1] [ https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/csv/ | https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/csv/ ] 
Best, 
Alexander Fedulov 
On Mon, Jul 11, 2022 at 5:43 PM < [ mailto:podunk@gmx.com | podunk@gmx.com ] > wrote: 



No, I did not mean. 
I said 'Does Table API connector, CSV, has some option to ignore some columns in source file?' 
Sent: Monday, July 11, 2022 at 5:28 PM 
From: "Xuyang" < [ mailto:xyzhong131@163.com | xyzhong131@163.com ] > 
To: [ mailto:podunk@gmx.com | podunk@gmx.com ] 
Cc: [ mailto:user@flink.apache.org | user@flink.apache.org ] 
Subject: Re:Re: Does Table API connector, csv, has some option to ignore some columns 


Hi, did you mean `insert into table1 select col1, col2, col3 ... from table2`? 



If this doesn't meet your requirement, what about using UDF to custom what you want in runtime. 




-- 
Best! 
Xuyang 




在 2022-07-11 16:10:00, [ mailto:podunk@gmx.com | podunk@gmx.com ] 写道: 
BQ_BEGIN

I want to control what I insert in table not what I get from table. 
Sent: Monday, July 11, 2022 at 3:37 AM 
From: "Shengkai Fang" < [ mailto:fskmine@gmail.com | fskmine@gmail.com ] > 
To: [ mailto:podunk@gmx.com | podunk@gmx.com ] 
Cc: "user" < [ mailto:user@flink.apache.org | user@flink.apache.org ] > 
Subject: Re: Does Table API connector, csv, has some option to ignore some columns 
Hi. 
In Flink SQL, you can select the column that you wants in the query. For example, you can use 
``` 
SELECT col_a, col_b FROM some_table; 
``` 
Best, 
Shengkai 
< [ mailto:podunk@gmx.com | podunk@gmx.com ] > 于2022年7月9日周六 01:48写道: 

BQ_BEGIN

Does Table API connector, CSV, has some option to ignore some columns in source file? 
For instance read only first, second, nine... but not the others? 
Or any other trick? 
CREATE TABLE some_table ( some_id BIGINT , ... ) WITH ( 'format' = 'csv' , ... ) 




BQ_END


BQ_END



Re: Re: Does Table API connector, csv, has some option to ignore some columns

Posted by po...@gmx.com.
This is really surprising.

When you import data from a file, you really rarely need to import everything
from that file. Most often it is several columns.

So the program that reads the file should be able to do this - this is the ABC
of working with data.



Often the suggestion is "you can write your script". Sure. I can. I can write
the entire program here - from scratch.

But I use a ready-made program to avoid writing my scripts.





**Sent:**  Tuesday, July 12, 2022 at 12:24 AM  
**From:**  "Alexander Fedulov" <al...@ververica.com>  
**To:**  podunk@gmx.com  
**Cc:**  "user" <us...@flink.apache.org>  
**Subject:**  Re: Re: Does Table API connector, csv, has some option to ignore
some columns

Hi podunk,



no, this is currently not possible:  
> Currently, the CSV schema is derived from table schema. [1]



So the Table schema is used to define how Jackson CSV parses the lines and
hence needs to be complete.



[1] <https://nightlies.apache.org/flink/flink-docs-
master/docs/connectors/table/formats/csv/>



Best,

Alexander Fedulov



On Mon, Jul 11, 2022 at 5:43 PM <[podunk@gmx.com](mailto:podunk@gmx.com)>
wrote:

> No, I did not mean.
>
> I said 'Does Table API connector, CSV, has some option to ignore some
> columns in source file?'
>
>  
>
>  
>
> **Sent:**  Monday, July 11, 2022 at 5:28 PM  
>  **From:**  "Xuyang" <[xyzhong131@163.com](mailto:xyzhong131@163.com)>  
>  **To:**  [podunk@gmx.com](mailto:podunk@gmx.com)  
>  **Cc:**  [user@flink.apache.org](mailto:user@flink.apache.org)  
>  **Subject:**  Re:Re: Does Table API connector, csv, has some option to
> ignore some columns
>
> Hi, did you mean `insert into table1 select col1, col2, col3 ... from
> table2`?
>
>  
>
> If this doesn't meet your requirement, what about using UDF to custom what
> you want in runtime.
>
>  
>
> \--
>
>     Best!
>
>     Xuyang
>
>  
>
>  
>
>  
>
> 在 2022-07-11 16:10:00,[podunk@gmx.com](mailto:podunk@gmx.com) 写道:
>

>> I want to control what I insert in table not what I get from table.

>>

>>  
>>

>>  
>>

>> **Sent:**  Monday, July 11, 2022 at 3:37 AM  
>  **From:**  "Shengkai Fang" <[fskmine@gmail.com](mailto:fskmine@gmail.com)>  
>  **To:**  [podunk@gmx.com](mailto:podunk@gmx.com)  
>  **Cc:**  "user" <[user@flink.apache.org](mailto:user@flink.apache.org)>  
>  **Subject:**  Re: Does Table API connector, csv, has some option to ignore
> some columns
>>

>> Hi.

>>

>>  
>>

>> In Flink SQL, you can select the column that you wants in the query. For
example, you can use

>>

>>  
>>

>> ```

>>

>> SELECT col_a, col_b FROM some_table;

>>

>> ```

>>

>>  
>>

>> Best,

>>

>> Shengkai

>>

>>  
>>

>>  
>>

>> <[podunk@gmx.com](mailto:podunk@gmx.com)> 于2022年7月9日周六 01:48写道:

>>

>>> Does Table API connector, CSV, has some option to ignore some columns in
source file?

>>>

>>> For instance read only first, second, nine... but not the others?

>>>

>>>  
>>>

>>> Or any other trick?

>>>  
>>>  
>>>     CREATE TABLE some_table (

>>>       some_id BIGINT,

>>>       ...

>>>     ) WITH (

>>>      'format' = 'csv',

>>>      ...

>>>     )

>>>  
>>>

>>>  
>>>

>>>  
>>>

>>>  


Re: Re: Does Table API connector, csv, has some option to ignore some columns

Posted by Alexander Fedulov <al...@ververica.com>.
Hi podunk,

no, this is currently not possible:
> Currently, the CSV schema is derived from table schema. [1]

So the Table schema is used to define how Jackson CSV parses the lines and
hence needs to be complete.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/csv/

Best,
Alexander Fedulov

On Mon, Jul 11, 2022 at 5:43 PM <po...@gmx.com> wrote:

> No, I did not mean.
> I said 'Does Table API connector, CSV, has some option to ignore some
> columns in source file?'
>
>
> *Sent:* Monday, July 11, 2022 at 5:28 PM
> *From:* "Xuyang" <xy...@163.com>
> *To:* podunk@gmx.com
> *Cc:* user@flink.apache.org
> *Subject:* Re:Re: Does Table API connector, csv, has some option to
> ignore some columns
>
> Hi, did you mean `insert into table1 select col1, col2, col3 ... from
> table2`?
>
>
>
> If this doesn't meet your requirement, what about using UDF to custom what
> you want in runtime.
>
>
>
> --
>     Best!
>     Xuyang
>
>
>
>
>
> 在 2022-07-11 16:10:00,podunk@gmx.com 写道:
>
> I want to control what I insert in table not what I get from table.
>
>
> *Sent:* Monday, July 11, 2022 at 3:37 AM
> *From:* "Shengkai Fang" <fs...@gmail.com>
> *To:* podunk@gmx.com
> *Cc:* "user" <us...@flink.apache.org>
> *Subject:* Re: Does Table API connector, csv, has some option to ignore
> some columns
> Hi.
>
> In Flink SQL, you can select the column that you wants in the query. For
> example, you can use
>
> ```
> SELECT col_a, col_b FROM some_table;
> ```
>
> Best,
> Shengkai
>
>
> <po...@gmx.com> 于2022年7月9日周六 01:48写道:
>
>> Does Table API connector, CSV, has some option to ignore some columns in
>> source file?
>> For instance read only first, second, nine... but not the others?
>>
>> Or any other trick?
>>
>> CREATE TABLE some_table (
>>   some_id BIGINT,
>>   ...) WITH (
>>  'format' = 'csv',
>>  ...)
>>
>>
>>
>>
>>
>

Re: Re: Does Table API connector, csv, has some option to ignore some columns

Posted by po...@gmx.com.
No, I did not mean.

I said 'Does Table API connector, CSV, has some option to ignore some columns
in source file?'





**Sent:**  Monday, July 11, 2022 at 5:28 PM  
**From:**  "Xuyang" <xy...@163.com>  
**To:**  podunk@gmx.com  
**Cc:**  user@flink.apache.org  
**Subject:**  Re:Re: Does Table API connector, csv, has some option to ignore
some columns

Hi, did you mean `insert into table1 select col1, col2, col3 ... from table2`?



If this doesn't meet your requirement, what about using UDF to custom what you
want in runtime.



\--

    Best!

    Xuyang







在 2022-07-11 16:10:00,podunk@gmx.com 写道:

> I want to control what I insert in table not what I get from table.
>
>  
>
>  
>
> **Sent:**  Monday, July 11, 2022 at 3:37 AM  
>  **From:**  "Shengkai Fang" <fs...@gmail.com>  
>  **To:**  podunk@gmx.com  
>  **Cc:**  "user" <us...@flink.apache.org>  
>  **Subject:**  Re: Does Table API connector, csv, has some option to ignore
> some columns
>
> Hi.
>
>  
>
> In Flink SQL, you can select the column that you wants in the query. For
> example, you can use
>
>  
>
> ```
>
> SELECT col_a, col_b FROM some_table;
>
> ```
>
>  
>
> Best,
>
> Shengkai
>
>  
>
>  
>
> <[podunk@gmx.com](mailto:podunk@gmx.com)> 于2022年7月9日周六 01:48写道:
>

>> Does Table API connector, CSV, has some option to ignore some columns in
source file?

>>

>> For instance read only first, second, nine... but not the others?

>>

>>  
>>

>> Or any other trick?

>>  
>>  
>>     CREATE TABLE some_table (

>>       some_id BIGINT,

>>       ...

>>     ) WITH (

>>      'format' = 'csv',

>>      ...

>>     )

>>  
>>

>>  
>>

>>  
>>

>>  


Re:Re: Does Table API connector, csv, has some option to ignore some columns

Posted by Xuyang <xy...@163.com>.
Hi, did you mean `insert into table1 select col1, col2, col3 ... from table2`?




If this doesn't meet your requirement, what about using UDF to custom what you want in runtime.




--

    Best!
    Xuyang




在 2022-07-11 16:10:00,podunk@gmx.com 写道:

I want to control what I insert in table not what I get from table.
 
 
Sent: Monday, July 11, 2022 at 3:37 AM
From: "Shengkai Fang" <fs...@gmail.com>
To: podunk@gmx.com
Cc: "user" <us...@flink.apache.org>
Subject: Re: Does Table API connector, csv, has some option to ignore some columns
Hi. 
 
In Flink SQL, you can select the column that you wants in the query. For example, you can use 
 
```
SELECT col_a, col_b FROM some_table;
```
 
Best,
Shengkai
 
 
<po...@gmx.com> 于2022年7月9日周六 01:48写道:
Does Table API connector, CSV, has some option to ignore some columns in source file?
For instance read only first, second, nine... but not the others?
 
Or any other trick?
CREATETABLEsome_table(some_idBIGINT,
  ...
)WITH('format'='csv',
 ...
)
 
 
 

Re: Does Table API connector, csv, has some option to ignore some columns

Posted by po...@gmx.com.
I want to control what I insert in table not what I get from table.





**Sent:**  Monday, July 11, 2022 at 3:37 AM  
**From:**  "Shengkai Fang" <fs...@gmail.com>  
**To:**  podunk@gmx.com  
**Cc:**  "user" <us...@flink.apache.org>  
**Subject:**  Re: Does Table API connector, csv, has some option to ignore
some columns

Hi.



In Flink SQL, you can select the column that you wants in the query. For
example, you can use



```

SELECT col_a, col_b FROM some_table;

```



Best,

Shengkai





<[podunk@gmx.com](mailto:podunk@gmx.com)> 于2022年7月9日周六 01:48写道:

> Does Table API connector, CSV, has some option to ignore some columns in
> source file?
>
> For instance read only first, second, nine... but not the others?
>
>  
>
> Or any other trick?
>  
>  
>     CREATE TABLE some_table (
>       some_id BIGINT,
>       ...
>     ) WITH (
>      'format' = 'csv',
>      ...
>     )
>  
>
>  
>
>  
>
>  


Re: Does Table API connector, csv, has some option to ignore some columns

Posted by Shengkai Fang <fs...@gmail.com>.
Hi.

In Flink SQL, you can select the column that you wants in the query. For
example, you can use

```
SELECT col_a, col_b FROM some_table;
```

Best,
Shengkai


<po...@gmx.com> 于2022年7月9日周六 01:48写道:

> Does Table API connector, CSV, has some option to ignore some columns in
> source file?
> For instance read only first, second, nine... but not the others?
>
> Or any other trick?
>
> CREATE TABLE some_table (
>   some_id BIGINT,
>   ...) WITH (
>  'format' = 'csv',
>  ...)
>
>
>
>
>