You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@hop.apache.org by po...@gmx.com on 2022/06/03 15:31:47 UTC

Beam input - problem with data format


Hi,



since 3 hours I'm trying to do such trivial thing like opne file in Beam
input.



My file is:



80010;Some customer 1;2344;Address 1  
80011;Some customer 2;7546;Address 2  
80012;Some customer 3;4564;Address 3  
80013;Some customer 4;7564;Address 4  
80014;Some customer 5;2354;Address 5



I defined input file in Pipeline Run configuration - ${CUSTOMERS}

I have Beam file definition 'customers' (see attachment)

In Beam input i have - Input location: ${CUSTOMERS}, File definition to:
customers



When I start pipeline I receive error:

  
2022/06/03 17:24:40 - merge_join - client_no Integer(5) : There was a data
type error: the data type of java.lang.String object [Some customer 1] does
not correspond to value meta [Integer(5)]



For some reason first column, client_no, is completly ignored (yes, i have ';'
as field separator) and the second is taken.

Does not matter what end of line in file, does not matter what kind of
encoding.



I open same file in 'Text file input' and all is OK.



Any suggestion?


Re: Beam input - problem with data format

Posted by Matt Casters <ma...@neo4j.com>.
If you think it's a bug and you have a reproduction case we would very much
welcome a JIRA case for this.
Most likely we're dealing with an issue with the file encoding or the line
endings but it's still worth looking into.

Thanks in advance!

Matt

Op di 7 jun. 2022 15:56 schreef <po...@gmx.com>:

>
> After another few days I spent on it I'm quite close this is some bug in
> software:
>
> Capture1.PNG - when executed 'Local'
>
> "2022/06/07 22:53:53 - merge_join_test - client_id Integer(15) : There was
> a data type error: the data type of java.lang.String object ["Some customer
> 4"] does not correspond to value meta [Integer(15)]"
>
> When exectued with "Beam Flink pipeline engine"
>
> I tried everything.
>
> M.
> *Sent:* Friday, June 03, 2022 at 5:31 PM
> *From:* podunk@gmx.com
> *To:* users@hop.apache.org
> *Subject:* Beam input - problem with data format
>
> Hi,
>
> since 3 hours I'm trying to do such trivial thing like opne file in Beam
> input.
>
> My file is:
>
> 80010;Some customer 1;2344;Address 1
> 80011;Some customer 2;7546;Address 2
> 80012;Some customer 3;4564;Address 3
> 80013;Some customer 4;7564;Address 4
> 80014;Some customer 5;2354;Address 5
>
> I defined input file in Pipeline Run configuration - ${CUSTOMERS}
> I have Beam file definition 'customers' (see attachment)
> In Beam input i have - Input location: ${CUSTOMERS}, File definition to:
> customers
>
> When I start pipeline I receive error:
>
> 2022/06/03 17:24:40 - merge_join - client_no Integer(5) : There was a data
> type error: the data type of java.lang.String object [Some customer 1] does
> not correspond to value meta [Integer(5)]
>
> For some reason first column, client_no, is completly ignored (yes, i have
> ';' as field separator) and the second is taken.
> Does not matter what end of line in file, does not matter what kind of
> encoding.
>
> I open same file in 'Text file input' and all is OK.
>
> Any suggestion?
>
>
>

Re: Beam input - problem with data format

Posted by Matt Casters <ma...@neo4j.com>.
If you think it's a bug and you have a reproduction case we would very much
welcome a JIRA case for this.
Most likely we're dealing with an issue with the file encoding or the line
endings but it's still worth looking into.

Thanks in advance!

Matt

Op di 7 jun. 2022 15:56 schreef <po...@gmx.com>:

>
> After another few days I spent on it I'm quite close this is some bug in
> software:
>
> Capture1.PNG - when executed 'Local'
>
> "2022/06/07 22:53:53 - merge_join_test - client_id Integer(15) : There was
> a data type error: the data type of java.lang.String object ["Some customer
> 4"] does not correspond to value meta [Integer(15)]"
>
> When exectued with "Beam Flink pipeline engine"
>
> I tried everything.
>
> M.
> *Sent:* Friday, June 03, 2022 at 5:31 PM
> *From:* podunk@gmx.com
> *To:* users@hop.apache.org
> *Subject:* Beam input - problem with data format
>
> Hi,
>
> since 3 hours I'm trying to do such trivial thing like opne file in Beam
> input.
>
> My file is:
>
> 80010;Some customer 1;2344;Address 1
> 80011;Some customer 2;7546;Address 2
> 80012;Some customer 3;4564;Address 3
> 80013;Some customer 4;7564;Address 4
> 80014;Some customer 5;2354;Address 5
>
> I defined input file in Pipeline Run configuration - ${CUSTOMERS}
> I have Beam file definition 'customers' (see attachment)
> In Beam input i have - Input location: ${CUSTOMERS}, File definition to:
> customers
>
> When I start pipeline I receive error:
>
> 2022/06/03 17:24:40 - merge_join - client_no Integer(5) : There was a data
> type error: the data type of java.lang.String object [Some customer 1] does
> not correspond to value meta [Integer(5)]
>
> For some reason first column, client_no, is completly ignored (yes, i have
> ';' as field separator) and the second is taken.
> Does not matter what end of line in file, does not matter what kind of
> encoding.
>
> I open same file in 'Text file input' and all is OK.
>
> Any suggestion?
>
>
>

Re: Beam input - problem with data format

Posted by po...@gmx.com.

After another few days I spent on it I'm quite close this is some bug in
software:



Capture1.PNG - when executed 'Local'



"2022/06/07 22:53:53 - merge_join_test - client_id Integer(15) : There was a
data type error: the data type of java.lang.String object ["Some customer 4"]
does not correspond to value meta [Integer(15)]"



When exectued with "Beam Flink pipeline engine"



I tried everything.



M.

**Sent:**  Friday, June 03, 2022 at 5:31 PM  
**From:**  podunk@gmx.com  
**To:**  users@hop.apache.org  
**Subject:**  Beam input - problem with data format



Hi,



since 3 hours I'm trying to do such trivial thing like opne file in Beam
input.



My file is:



80010;Some customer 1;2344;Address 1  
80011;Some customer 2;7546;Address 2  
80012;Some customer 3;4564;Address 3  
80013;Some customer 4;7564;Address 4  
80014;Some customer 5;2354;Address 5



I defined input file in Pipeline Run configuration - ${CUSTOMERS}

I have Beam file definition 'customers' (see attachment)

In Beam input i have - Input location: ${CUSTOMERS}, File definition to:
customers



When I start pipeline I receive error:

  
2022/06/03 17:24:40 - merge_join - client_no Integer(5) : There was a data
type error: the data type of java.lang.String object [Some customer 1] does
not correspond to value meta [Integer(5)]



For some reason first column, client_no, is completly ignored (yes, i have ';'
as field separator) and the second is taken.

Does not matter what end of line in file, does not matter what kind of
encoding.



I open same file in 'Text file input' and all is OK.



Any suggestion?