You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Arsalan Bilal <ch...@gmail.com> on 2011/09/29 16:23:30 UTC

Problem in Input Format Class

i want to read text file (sample given below) separated with semicolon(;)
using mapper. Each one record is separated with semicolon(;).
Should i write my own custom input format class? OR
Is there exist any input format class that ask about separator?

input File look

1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
100;;;1a325416:123254:f1;16309792000;310012001001001;c0000000;;
310:012:0000401;;;;;;;;;;;;;;;;;172.16.47.33;;;310:012:555550b;;


-- 
Best Regards,
Arsalan Bilal

Re: Problem in Input Format Class

Posted by Harsh J <ha...@cloudera.com>.
Arsalan,

This isn't a HBase question. This belongs on
mapreduce-user@hadoop.apache.org lists. I'm moving it there, and lets
carry on on that list. I've added you to cc in case you are not
subscribed to the mentioned list :)

Also, to reply to your original question - No, there isn't anything in
Hadoop core's Java APIs that lets you do this 'essential' task. It
would be useful to have a delimited text input format, if you'd like
to contribute one. Perhaps use OpenCSV or such a library for good
extensibility over delimited files.

For the record, Pig and Hive have support for such needs in them
natively, and you can use these downstream libraries to get down and
dirty with your data quickly.

On Fri, Sep 30, 2011 at 5:07 PM, Arsalan Bilal <ch...@gmail.com> wrote:
> I am asking about such* **InputFormat* and *RecordReader* that read strings
> of text separated by (;) semicolon characters
>
>
> On Fri, Sep 30, 2011 at 11:11 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>> Sorry, I think I got confused by the question and talked about the
>> OutputFormat not the input format, which is apparently what you are looking
>> for. Please ignore my answer. Apologies!
>>
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>>
>> On Fri, Sep 30, 2011 at 12:32 PM, Sonal Goyal <so...@gmail.com>
>> wrote:
>>
>> > Hi Arsalan,
>> >
>> > Are you trying to insert this data into HBase or are you trying to just
>> > process this log file using Hadoop? I am not sure how your question is
>> > related to HBase, so if it is unrelated, you can seek help on the
>> mapreduce
>> > user lists.
>> >
>> > For a MR job, you can use TextInputFormat and specify the custom
>> separator.
>> > See https://issues.apache.org/jira/browse/HADOOP-3295.
>> >
>> > Best Regards,
>> > Sonal
>> > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>> > Nube Technologies <http://www.nubetech.co>
>> >
>> > <http://in.linkedin.com/in/sonalgoyal>
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Sep 30, 2011 at 12:26 PM, Arsalan Bilal <
>> charsalanbilal@gmail.com>wrote:
>> >
>> >> No , i did not try Guava's Splitter
>> >> I am asking about such input format class that takes also separator.
>> >> As example, job.setInputFormatClass(<Class Format>, <Seperator>);
>> >> What will be inputFormatClass here that support separator?
>> >>
>> >>
>> >> On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov>
>> >> wrote:
>> >>
>> >> > Have you considered just taking the line of text as is and using
>> Guava's
>> >> > Splitter?
>> >> >
>> >> > Not sure how this is related to HBase
>> >> >
>> >> > -----Original Message-----
>> >> > From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
>> >> > Sent: Thursday, September 29, 2011 7:24 AM
>> >> > To: user@hbase.apache.org
>> >> > Subject: Problem in Input Format Class
>> >> >
>> >> > i want to read text file (sample given below) separated with
>> >> semicolon(;)
>> >> > using mapper. Each one record is separated with semicolon(;).
>> >> > Should i write my own custom input format class? OR
>> >> > Is there exist any input format class that ask about separator?
>> >> >
>> >> > input File look
>> >> >
>> >> > 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Arsalan Bilal
>> >>
>> >
>> >
>>
>
>
>
> --
> Best Regards,
> Arsalan Bilal
>



-- 
Harsh J

Re: Problem in Input Format Class

Posted by Harsh J <ha...@cloudera.com>.
Arsalan,

This isn't a HBase question. This belongs on
mapreduce-user@hadoop.apache.org lists. I'm moving it there, and lets
carry on on that list. I've added you to cc in case you are not
subscribed to the mentioned list :)

Also, to reply to your original question - No, there isn't anything in
Hadoop core's Java APIs that lets you do this 'essential' task. It
would be useful to have a delimited text input format, if you'd like
to contribute one. Perhaps use OpenCSV or such a library for good
extensibility over delimited files.

For the record, Pig and Hive have support for such needs in them
natively, and you can use these downstream libraries to get down and
dirty with your data quickly.

On Fri, Sep 30, 2011 at 5:07 PM, Arsalan Bilal <ch...@gmail.com> wrote:
> I am asking about such* **InputFormat* and *RecordReader* that read strings
> of text separated by (;) semicolon characters
>
>
> On Fri, Sep 30, 2011 at 11:11 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>> Sorry, I think I got confused by the question and talked about the
>> OutputFormat not the input format, which is apparently what you are looking
>> for. Please ignore my answer. Apologies!
>>
>> Best Regards,
>> Sonal
>> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>>
>> On Fri, Sep 30, 2011 at 12:32 PM, Sonal Goyal <so...@gmail.com>
>> wrote:
>>
>> > Hi Arsalan,
>> >
>> > Are you trying to insert this data into HBase or are you trying to just
>> > process this log file using Hadoop? I am not sure how your question is
>> > related to HBase, so if it is unrelated, you can seek help on the
>> mapreduce
>> > user lists.
>> >
>> > For a MR job, you can use TextInputFormat and specify the custom
>> separator.
>> > See https://issues.apache.org/jira/browse/HADOOP-3295.
>> >
>> > Best Regards,
>> > Sonal
>> > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
>> > Nube Technologies <http://www.nubetech.co>
>> >
>> > <http://in.linkedin.com/in/sonalgoyal>
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Sep 30, 2011 at 12:26 PM, Arsalan Bilal <
>> charsalanbilal@gmail.com>wrote:
>> >
>> >> No , i did not try Guava's Splitter
>> >> I am asking about such input format class that takes also separator.
>> >> As example, job.setInputFormatClass(<Class Format>, <Seperator>);
>> >> What will be inputFormatClass here that support separator?
>> >>
>> >>
>> >> On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov>
>> >> wrote:
>> >>
>> >> > Have you considered just taking the line of text as is and using
>> Guava's
>> >> > Splitter?
>> >> >
>> >> > Not sure how this is related to HBase
>> >> >
>> >> > -----Original Message-----
>> >> > From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
>> >> > Sent: Thursday, September 29, 2011 7:24 AM
>> >> > To: user@hbase.apache.org
>> >> > Subject: Problem in Input Format Class
>> >> >
>> >> > i want to read text file (sample given below) separated with
>> >> semicolon(;)
>> >> > using mapper. Each one record is separated with semicolon(;).
>> >> > Should i write my own custom input format class? OR
>> >> > Is there exist any input format class that ask about separator?
>> >> >
>> >> > input File look
>> >> >
>> >> > 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Arsalan Bilal
>> >>
>> >
>> >
>>
>
>
>
> --
> Best Regards,
> Arsalan Bilal
>



-- 
Harsh J

Re: Problem in Input Format Class

Posted by Arsalan Bilal <ch...@gmail.com>.
I am asking about such* **InputFormat* and *RecordReader* that read strings
of text separated by (;) semicolon characters


On Fri, Sep 30, 2011 at 11:11 AM, Sonal Goyal <so...@gmail.com> wrote:

> Sorry, I think I got confused by the question and talked about the
> OutputFormat not the input format, which is apparently what you are looking
> for. Please ignore my answer. Apologies!
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
> On Fri, Sep 30, 2011 at 12:32 PM, Sonal Goyal <so...@gmail.com>
> wrote:
>
> > Hi Arsalan,
> >
> > Are you trying to insert this data into HBase or are you trying to just
> > process this log file using Hadoop? I am not sure how your question is
> > related to HBase, so if it is unrelated, you can seek help on the
> mapreduce
> > user lists.
> >
> > For a MR job, you can use TextInputFormat and specify the custom
> separator.
> > See https://issues.apache.org/jira/browse/HADOOP-3295.
> >
> > Best Regards,
> > Sonal
> > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> > Nube Technologies <http://www.nubetech.co>
> >
> > <http://in.linkedin.com/in/sonalgoyal>
> >
> >
> >
> >
> >
> >
> > On Fri, Sep 30, 2011 at 12:26 PM, Arsalan Bilal <
> charsalanbilal@gmail.com>wrote:
> >
> >> No , i did not try Guava's Splitter
> >> I am asking about such input format class that takes also separator.
> >> As example, job.setInputFormatClass(<Class Format>, <Seperator>);
> >> What will be inputFormatClass here that support separator?
> >>
> >>
> >> On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov>
> >> wrote:
> >>
> >> > Have you considered just taking the line of text as is and using
> Guava's
> >> > Splitter?
> >> >
> >> > Not sure how this is related to HBase
> >> >
> >> > -----Original Message-----
> >> > From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
> >> > Sent: Thursday, September 29, 2011 7:24 AM
> >> > To: user@hbase.apache.org
> >> > Subject: Problem in Input Format Class
> >> >
> >> > i want to read text file (sample given below) separated with
> >> semicolon(;)
> >> > using mapper. Each one record is separated with semicolon(;).
> >> > Should i write my own custom input format class? OR
> >> > Is there exist any input format class that ask about separator?
> >> >
> >> > input File look
> >> >
> >> > 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
> >> >
> >> >
> >>
> >>
> >> --
> >> Best Regards,
> >> Arsalan Bilal
> >>
> >
> >
>



-- 
Best Regards,
Arsalan Bilal

Re: Problem in Input Format Class

Posted by Sonal Goyal <so...@gmail.com>.
Sorry, I think I got confused by the question and talked about the
OutputFormat not the input format, which is apparently what you are looking
for. Please ignore my answer. Apologies!

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Fri, Sep 30, 2011 at 12:32 PM, Sonal Goyal <so...@gmail.com> wrote:

> Hi Arsalan,
>
> Are you trying to insert this data into HBase or are you trying to just
> process this log file using Hadoop? I am not sure how your question is
> related to HBase, so if it is unrelated, you can seek help on the mapreduce
> user lists.
>
> For a MR job, you can use TextInputFormat and specify the custom separator.
> See https://issues.apache.org/jira/browse/HADOOP-3295.
>
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
>
>
> On Fri, Sep 30, 2011 at 12:26 PM, Arsalan Bilal <ch...@gmail.com>wrote:
>
>> No , i did not try Guava's Splitter
>> I am asking about such input format class that takes also separator.
>> As example, job.setInputFormatClass(<Class Format>, <Seperator>);
>> What will be inputFormatClass here that support separator?
>>
>>
>> On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov>
>> wrote:
>>
>> > Have you considered just taking the line of text as is and using Guava's
>> > Splitter?
>> >
>> > Not sure how this is related to HBase
>> >
>> > -----Original Message-----
>> > From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
>> > Sent: Thursday, September 29, 2011 7:24 AM
>> > To: user@hbase.apache.org
>> > Subject: Problem in Input Format Class
>> >
>> > i want to read text file (sample given below) separated with
>> semicolon(;)
>> > using mapper. Each one record is separated with semicolon(;).
>> > Should i write my own custom input format class? OR
>> > Is there exist any input format class that ask about separator?
>> >
>> > input File look
>> >
>> > 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
>> >
>> >
>>
>>
>> --
>> Best Regards,
>> Arsalan Bilal
>>
>
>

Re: Problem in Input Format Class

Posted by Sonal Goyal <so...@gmail.com>.
Hi Arsalan,

Are you trying to insert this data into HBase or are you trying to just
process this log file using Hadoop? I am not sure how your question is
related to HBase, so if it is unrelated, you can seek help on the mapreduce
user lists.

For a MR job, you can use TextInputFormat and specify the custom separator.
See https://issues.apache.org/jira/browse/HADOOP-3295.

Best Regards,
Sonal
Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Fri, Sep 30, 2011 at 12:26 PM, Arsalan Bilal <ch...@gmail.com>wrote:

> No , i did not try Guava's Splitter
> I am asking about such input format class that takes also separator.
> As example, job.setInputFormatClass(<Class Format>, <Seperator>);
> What will be inputFormatClass here that support separator?
>
>
> On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov> wrote:
>
> > Have you considered just taking the line of text as is and using Guava's
> > Splitter?
> >
> > Not sure how this is related to HBase
> >
> > -----Original Message-----
> > From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
> > Sent: Thursday, September 29, 2011 7:24 AM
> > To: user@hbase.apache.org
> > Subject: Problem in Input Format Class
> >
> > i want to read text file (sample given below) separated with semicolon(;)
> > using mapper. Each one record is separated with semicolon(;).
> > Should i write my own custom input format class? OR
> > Is there exist any input format class that ask about separator?
> >
> > input File look
> >
> > 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
> >
> >
>
>
> --
> Best Regards,
> Arsalan Bilal
>

Re: Problem in Input Format Class

Posted by Arsalan Bilal <ch...@gmail.com>.
No , i did not try Guava's Splitter
I am asking about such input format class that takes also separator.
As example, job.setInputFormatClass(<Class Format>, <Seperator>);
What will be inputFormatClass here that support separator?


On Thu, Sep 29, 2011 at 8:39 PM, Buttler, David <bu...@llnl.gov> wrote:

> Have you considered just taking the line of text as is and using Guava's
> Splitter?
>
> Not sure how this is related to HBase
>
> -----Original Message-----
> From: Arsalan Bilal [mailto:charsalanbilal@gmail.com]
> Sent: Thursday, September 29, 2011 7:24 AM
> To: user@hbase.apache.org
> Subject: Problem in Input Format Class
>
> i want to read text file (sample given below) separated with semicolon(;)
> using mapper. Each one record is separated with semicolon(;).
> Should i write my own custom input format class? OR
> Is there exist any input format class that ask about separator?
>
> input File look
>
> 1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
>
>


-- 
Best Regards,
Arsalan Bilal

RE: Problem in Input Format Class

Posted by "Buttler, David" <bu...@llnl.gov>.
Have you considered just taking the line of text as is and using Guava's Splitter?

Not sure how this is related to HBase

-----Original Message-----
From: Arsalan Bilal [mailto:charsalanbilal@gmail.com] 
Sent: Thursday, September 29, 2011 7:24 AM
To: user@hbase.apache.org
Subject: Problem in Input Format Class

i want to read text file (sample given below) separated with semicolon(;)
using mapper. Each one record is separated with semicolon(;).
Should i write my own custom input format class? OR
Is there exist any input format class that ask about separator?

input File look

1;00000003;310:012:8001:01;-05:00;04:04;2010;45;56164773;3;1;0;1;
100;;;1a325416:123254:f1;16309792000;310012001001001;c0000000;;
310:012:0000401;;;;;;;;;;;;;;;;;172.16.47.33;;;310:012:555550b;;


-- 
Best Regards,
Arsalan Bilal