You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Tushar Gosavi <tu...@gmail.com> on 2015/09/30 15:24:25 UTC

Re: [malhar-users] HDFS file read

Hi,

Moving this thread to dev@apex.

Which operator are you using for reading HDFS files? If you have written
your own
operator for parsing, then can you please check your parsing logic
separately and
make sure that it works before adding it into the operator.

- Tushar.


On Wed, Sep 30, 2015 at 4:11 PM, <ka...@gmail.com> wrote:

> HI,
> My requirement is to read HDFS file which has the separator as "\001".
> While developing the code in data torrent its unable to find the \001
> separator in the file. Actually that row has 15 columns but its taking as
> one column only.
>
> Kindly suggest me how to over come this.
>
> Below you can find the sample data of HDFS file.
> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28
> 18:29:42CHG9003REGCA201508P
> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28
> 15:20:24CHG9002REGIL201508P
>
> Totally 2 rows each row is having 15 columns.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Malhar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to malhar-users+unsubscribe@googlegroups.com.
> To post to this group, send email to malhar-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/malhar-users.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
“I'd have blown my top, because I want to beat this damn thing,
 as long as I've gone this far. I can't just leave it after I've found
 out so much about it. I have to keep going to find out ultimately
what is the matter with it in the end."
                Richard P. Feynman

Re: [malhar-users] HDFS file read

Posted by Chandni Singh <ch...@datatorrent.com>.
Hey Krishna,

Can you please try split("\u0001") if you are trying to split on the
unicode character.

Thanks,
Chandni

On Thu, Oct 1, 2015 at 8:50 AM, Munagala Ramanath <ra...@datatorrent.com>
wrote:

> Couple of questions:
>
> (a) Why do you have a backslash in the separator string ? Are you trying to
> split on the non-printable ASCII code 1 ?
> (b) The first line does not have the sub-string "001", so what are you
> getting as the result of split() call and what are you expecting ?
> (c) The second line does have the sub-string "001", so again, what are
> expecting for this line and what are you getting ?
>
>
>
> On Thu, Oct 1, 2015 at 1:32 AM, <ka...@gmail.com> wrote:
>
> >
> > Hi,
> > Below you can find the row which we are trying to split.
> >
> > 1855003555798283MFRAPS1858-11-17F1302015-08-282015-08-28
> > 18:29:44CHG9003REGCA201508P
> >
> > Thanks,
> > krishna
> >
> > On Thursday, October 1, 2015 at 1:33:12 PM UTC+5:30, Ashwin Chandra Putta
> > wrote:
> >>
> >> Krishna,
> >>
> >> Can you paste the line you are trying to split?
> >>
> >> Regards,
> >> Ashwin.
> >> On Oct 1, 2015 12:22 AM, <ka...@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am using AbstactFileInputOperator to read HDFS file and once i get
> the
> >>> first row and am trying to separate it by finding \001. There it is
> unable
> >>> to identify that separator.
> >>> Below you can have the reference code.
> >>>
> >>>  String temp=br.readLine();
> >>>
> >>>             *if*(temp!=*null*){
> >>>
> >>>             arr=temp.split("\001");
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> krishna
> >>>
> >>> On Wednesday, September 30, 2015 at 6:54:26 PM UTC+5:30, Tushar Gosavi
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Moving this thread to dev@apex.
> >>>>
> >>>> Which operator are you using for reading HDFS files? If you have
> >>>> written your own
> >>>> operator for parsing, then can you please check your parsing logic
> >>>> separately and
> >>>> make sure that it works before adding it into the operator.
> >>>>
> >>>> - Tushar.
> >>>>
> >>>>
> >>>> On Wed, Sep 30, 2015 at 4:11 PM, <ka...@gmail.com> wrote:
> >>>>
> >>>>> HI,
> >>>>> My requirement is to read HDFS file which has the separator as
> "\001".
> >>>>> While developing the code in data torrent its unable to find the \001
> >>>>> separator in the file. Actually that row has 15 columns but its
> taking as
> >>>>> one column only.
> >>>>>
> >>>>> Kindly suggest me how to over come this.
> >>>>>
> >>>>> Below you can find the sample data of HDFS file.
> >>>>> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28
> >>>>> 18:29:42CHG9003REGCA201508P
> >>>>> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28
> >>>>> 15:20:24CHG9002REGIL201508P
> >>>>>
> >>>>> Totally 2 rows each row is having 15 columns.
> >>>>>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups "Malhar" group.
> >>>>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>>>> an email to malhar-users...@googlegroups.com.
> >>>>> To post to this group, send email to malhar...@googlegroups.com.
> >>>>> Visit this group at http://groups.google.com/group/malhar-users.
> >>>>> For more options, visit https://groups.google.com/d/optout.
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> “I'd have blown my top, because I want to beat this damn thing,
> >>>>  as long as I've gone this far. I can't just leave it after I've found
> >>>>  out so much about it. I have to keep going to find out ultimately
> >>>> what is the matter with it in the end."
> >>>>                 Richard P. Feynman
> >>>>
> >>>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "Malhar" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an email to malhar-users...@googlegroups.com.
> >>> To post to this group, send email to malhar...@googlegroups.com.
> >>> Visit this group at http://groups.google.com/group/malhar-users.
> >>> For more options, visit https://groups.google.com/d/optout.
> >>>
> >>
>

Re: [malhar-users] HDFS file read

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Couple of questions:

(a) Why do you have a backslash in the separator string ? Are you trying to
split on the non-printable ASCII code 1 ?
(b) The first line does not have the sub-string "001", so what are you
getting as the result of split() call and what are you expecting ?
(c) The second line does have the sub-string "001", so again, what are
expecting for this line and what are you getting ?



On Thu, Oct 1, 2015 at 1:32 AM, <ka...@gmail.com> wrote:

>
> Hi,
> Below you can find the row which we are trying to split.
>
> 1855003555798283MFRAPS1858-11-17F1302015-08-282015-08-28
> 18:29:44CHG9003REGCA201508P
>
> Thanks,
> krishna
>
> On Thursday, October 1, 2015 at 1:33:12 PM UTC+5:30, Ashwin Chandra Putta
> wrote:
>>
>> Krishna,
>>
>> Can you paste the line you are trying to split?
>>
>> Regards,
>> Ashwin.
>> On Oct 1, 2015 12:22 AM, <ka...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using AbstactFileInputOperator to read HDFS file and once i get the
>>> first row and am trying to separate it by finding \001. There it is unable
>>> to identify that separator.
>>> Below you can have the reference code.
>>>
>>>  String temp=br.readLine();
>>>
>>>             *if*(temp!=*null*){
>>>
>>>             arr=temp.split("\001");
>>>
>>>
>>> Thanks,
>>>
>>> krishna
>>>
>>> On Wednesday, September 30, 2015 at 6:54:26 PM UTC+5:30, Tushar Gosavi
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Moving this thread to dev@apex.
>>>>
>>>> Which operator are you using for reading HDFS files? If you have
>>>> written your own
>>>> operator for parsing, then can you please check your parsing logic
>>>> separately and
>>>> make sure that it works before adding it into the operator.
>>>>
>>>> - Tushar.
>>>>
>>>>
>>>> On Wed, Sep 30, 2015 at 4:11 PM, <ka...@gmail.com> wrote:
>>>>
>>>>> HI,
>>>>> My requirement is to read HDFS file which has the separator as "\001".
>>>>> While developing the code in data torrent its unable to find the \001
>>>>> separator in the file. Actually that row has 15 columns but its taking as
>>>>> one column only.
>>>>>
>>>>> Kindly suggest me how to over come this.
>>>>>
>>>>> Below you can find the sample data of HDFS file.
>>>>> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28
>>>>> 18:29:42CHG9003REGCA201508P
>>>>> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28
>>>>> 15:20:24CHG9002REGIL201508P
>>>>>
>>>>> Totally 2 rows each row is having 15 columns.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Malhar" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to malhar-users...@googlegroups.com.
>>>>> To post to this group, send email to malhar...@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/malhar-users.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> “I'd have blown my top, because I want to beat this damn thing,
>>>>  as long as I've gone this far. I can't just leave it after I've found
>>>>  out so much about it. I have to keep going to find out ultimately
>>>> what is the matter with it in the end."
>>>>                 Richard P. Feynman
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Malhar" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to malhar-users...@googlegroups.com.
>>> To post to this group, send email to malhar...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/malhar-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

Re: [malhar-users] HDFS file read

Posted by ka...@gmail.com.
Hi,
Below you can find the row which we are trying to split.

1855003555798283MFRAPS1858-11-17F1302015-08-282015-08-28 
18:29:44CHG9003REGCA201508P

Thanks,
krishna

On Thursday, October 1, 2015 at 1:33:12 PM UTC+5:30, Ashwin Chandra Putta 
wrote:
>
> Krishna,
>
> Can you paste the line you are trying to split?
>
> Regards,
> Ashwin.
> On Oct 1, 2015 12:22 AM, <kalikrishn...@gmail.com <javascript:>> wrote:
>
>> Hi,
>>
>> I am using AbstactFileInputOperator to read HDFS file and once i get the 
>> first row and am trying to separate it by finding \001. There it is unable 
>> to identify that separator.
>> Below you can have the reference code.
>>
>>  String temp=br.readLine();
>>
>>             *if*(temp!=*null*){
>>
>>             arr=temp.split("\001");
>>
>>
>> Thanks,
>>
>> krishna
>>
>> On Wednesday, September 30, 2015 at 6:54:26 PM UTC+5:30, Tushar Gosavi 
>> wrote:
>>>
>>> Hi,
>>>
>>> Moving this thread to dev@apex. 
>>>
>>> Which operator are you using for reading HDFS files? If you have written 
>>> your own
>>> operator for parsing, then can you please check your parsing logic 
>>> separately and
>>> make sure that it works before adding it into the operator.
>>>
>>> - Tushar.
>>>
>>>
>>> On Wed, Sep 30, 2015 at 4:11 PM, <ka...@gmail.com> wrote:
>>>
>>>> HI,
>>>> My requirement is to read HDFS file which has the separator as "\001". 
>>>> While developing the code in data torrent its unable to find the \001 
>>>> separator in the file. Actually that row has 15 columns but its taking as 
>>>> one column only.
>>>>
>>>> Kindly suggest me how to over come this.
>>>>
>>>> Below you can find the sample data of HDFS file.
>>>> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28 
>>>> 18:29:42CHG9003REGCA201508P
>>>> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28 
>>>> 15:20:24CHG9002REGIL201508P
>>>>
>>>> Totally 2 rows each row is having 15 columns.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Malhar" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to malhar-users...@googlegroups.com.
>>>> To post to this group, send email to malhar...@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/malhar-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> “I'd have blown my top, because I want to beat this damn thing,
>>>  as long as I've gone this far. I can't just leave it after I've found
>>>  out so much about it. I have to keep going to find out ultimately 
>>> what is the matter with it in the end."
>>>                 Richard P. Feynman
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Malhar" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to malhar-users...@googlegroups.com <javascript:>.
>> To post to this group, send email to malhar...@googlegroups.com 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/malhar-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

Re: [malhar-users] HDFS file read

Posted by Ashwin Chandra Putta <as...@gmail.com>.
Krishna,

Can you paste the line you are trying to split?

Regards,
Ashwin.
On Oct 1, 2015 12:22 AM, <ka...@gmail.com> wrote:

> Hi,
>
> I am using AbstactFileInputOperator to read HDFS file and once i get the
> first row and am trying to separate it by finding \001. There it is unable
> to identify that separator.
> Below you can have the reference code.
>
>  String temp=br.readLine();
>
>             *if*(temp!=*null*){
>
>             arr=temp.split("\001");
>
>
> Thanks,
>
> krishna
>
> On Wednesday, September 30, 2015 at 6:54:26 PM UTC+5:30, Tushar Gosavi
> wrote:
>>
>> Hi,
>>
>> Moving this thread to dev@apex.
>>
>> Which operator are you using for reading HDFS files? If you have written
>> your own
>> operator for parsing, then can you please check your parsing logic
>> separately and
>> make sure that it works before adding it into the operator.
>>
>> - Tushar.
>>
>>
>> On Wed, Sep 30, 2015 at 4:11 PM, <ka...@gmail.com> wrote:
>>
>>> HI,
>>> My requirement is to read HDFS file which has the separator as "\001".
>>> While developing the code in data torrent its unable to find the \001
>>> separator in the file. Actually that row has 15 columns but its taking as
>>> one column only.
>>>
>>> Kindly suggest me how to over come this.
>>>
>>> Below you can find the sample data of HDFS file.
>>> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28
>>> 18:29:42CHG9003REGCA201508P
>>> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28
>>> 15:20:24CHG9002REGIL201508P
>>>
>>> Totally 2 rows each row is having 15 columns.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Malhar" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to malhar-users...@googlegroups.com.
>>> To post to this group, send email to malhar...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/malhar-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> “I'd have blown my top, because I want to beat this damn thing,
>>  as long as I've gone this far. I can't just leave it after I've found
>>  out so much about it. I have to keep going to find out ultimately
>> what is the matter with it in the end."
>>                 Richard P. Feynman
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Malhar" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to malhar-users+unsubscribe@googlegroups.com.
> To post to this group, send email to malhar-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/malhar-users.
> For more options, visit https://groups.google.com/d/optout.
>

Re: [malhar-users] HDFS file read

Posted by ka...@gmail.com.
Hi,

I am using AbstactFileInputOperator to read HDFS file and once i get the 
first row and am trying to separate it by finding \001. There it is unable 
to identify that separator.
Below you can have the reference code.

 String temp=br.readLine();

            *if*(temp!=*null*){

            arr=temp.split("\001");


Thanks,

krishna

On Wednesday, September 30, 2015 at 6:54:26 PM UTC+5:30, Tushar Gosavi 
wrote:
>
> Hi,
>
> Moving this thread to dev@apex. 
>
> Which operator are you using for reading HDFS files? If you have written 
> your own
> operator for parsing, then can you please check your parsing logic 
> separately and
> make sure that it works before adding it into the operator.
>
> - Tushar.
>
>
> On Wed, Sep 30, 2015 at 4:11 PM, <kalikrishn...@gmail.com <javascript:>> 
> wrote:
>
>> HI,
>> My requirement is to read HDFS file which has the separator as "\001". 
>> While developing the code in data torrent its unable to find the \001 
>> separator in the file. Actually that row has 15 columns but its taking as 
>> one column only.
>>
>> Kindly suggest me how to over come this.
>>
>> Below you can find the sample data of HDFS file.
>> 1855003555798283DTVDTV2015-08-07E2600077594.992015-08-282015-08-28 
>> 18:29:42CHG9003REGCA201508P
>> 1910001924128448DTVDTV2013-02-07P21407.22015-08-282015-08-28 
>> 15:20:24CHG9002REGIL201508P
>>
>> Totally 2 rows each row is having 15 columns.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Malhar" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to malhar-users...@googlegroups.com <javascript:>.
>> To post to this group, send email to malhar...@googlegroups.com 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/malhar-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> “I'd have blown my top, because I want to beat this damn thing,
>  as long as I've gone this far. I can't just leave it after I've found
>  out so much about it. I have to keep going to find out ultimately 
> what is the matter with it in the end."
>                 Richard P. Feynman
>
>