You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chhaya Vishwakarma <Ch...@lntinfotech.com> on 2014/02/26 06:22:48 UTC

Bulk load in hbase using pig

hi,

I have a log file in HDFS which needs to be parsed and put in a Hbase table.

I want to do this using PIG .

How can i go about it .Pig script should parse the logs and then put in Hbase?


Regards,
Chhaya Vishwakarma


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"

Re: Bulk load in hbase using pig

Posted by yonghu <yo...@gmail.com>.
if you want to load hbase log, why do you not directly write MapReduce
jobs. In pig, you need to write your customized load function. However, if
you write MapReduce job,
 you can directly use hbase api.


On Wed, Feb 26, 2014 at 2:15 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Could you please let us know how exactly you want to parse your logs?
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Wed, Feb 26, 2014 at 6:25 PM, David McNelis <dm...@gmail.com> wrote:
>
> > The big question is how the log file needs to be parsed / formatting.
>  I'd
> > be inclined to write a UDF that would take the line of text and return a
> > tuple of the values you'd be storing in hbase.
> >
> > Then you could do other operations on the bag of tuples that get passed
> > back.
> >
> > Alternatively, you could write a regex statement and use an internal pig
> > function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.
> >
> > I like the UDF approach in this case because then I can more easily write
> > unit tests around my log parser and get that testing out of the way
> before
> > actually spawning any jobs.
> >
> >
> > On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma <
> > Chhaya.Vishwakarma@lntinfotech.com> wrote:
> >
> > > hi,
> > >
> > > I have a log file in HDFS which needs to be parsed and put in a Hbase
> > > table.
> > >
> > > I want to do this using PIG .
> > >
> > > How can i go about it .Pig script should parse the logs and then put in
> > > Hbase?
> > >
> > >
> > > Regards,
> > > Chhaya Vishwakarma
> > >
> > >
> > > ________________________________
> > > The contents of this e-mail and any attachment(s) may contain
> > confidential
> > > or privileged information for the intended recipient(s). Unintended
> > > recipients are prohibited from taking action on the basis of
> information
> > in
> > > this e-mail and using or disseminating the information, and must notify
> > the
> > > sender and delete it from their system. L&T Infotech will not accept
> > > responsibility or liability for the accuracy or completeness of, or the
> > > presence of any virus or disabling code in this e-mail"
> > >
> >
>

Re: Bulk load in hbase using pig

Posted by Mohammad Tariq <do...@gmail.com>.
Could you please let us know how exactly you want to parse your logs?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Feb 26, 2014 at 6:25 PM, David McNelis <dm...@gmail.com> wrote:

> The big question is how the log file needs to be parsed / formatting.  I'd
> be inclined to write a UDF that would take the line of text and return a
> tuple of the values you'd be storing in hbase.
>
> Then you could do other operations on the bag of tuples that get passed
> back.
>
> Alternatively, you could write a regex statement and use an internal pig
> function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.
>
> I like the UDF approach in this case because then I can more easily write
> unit tests around my log parser and get that testing out of the way before
> actually spawning any jobs.
>
>
> On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma <
> Chhaya.Vishwakarma@lntinfotech.com> wrote:
>
> > hi,
> >
> > I have a log file in HDFS which needs to be parsed and put in a Hbase
> > table.
> >
> > I want to do this using PIG .
> >
> > How can i go about it .Pig script should parse the logs and then put in
> > Hbase?
> >
> >
> > Regards,
> > Chhaya Vishwakarma
> >
> >
> > ________________________________
> > The contents of this e-mail and any attachment(s) may contain
> confidential
> > or privileged information for the intended recipient(s). Unintended
> > recipients are prohibited from taking action on the basis of information
> in
> > this e-mail and using or disseminating the information, and must notify
> the
> > sender and delete it from their system. L&T Infotech will not accept
> > responsibility or liability for the accuracy or completeness of, or the
> > presence of any virus or disabling code in this e-mail"
> >
>

Re: Bulk load in hbase using pig

Posted by David McNelis <dm...@gmail.com>.
The big question is how the log file needs to be parsed / formatting.  I'd
be inclined to write a UDF that would take the line of text and return a
tuple of the values you'd be storing in hbase.

Then you could do other operations on the bag of tuples that get passed
back.

Alternatively, you could write a regex statement and use an internal pig
function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.

I like the UDF approach in this case because then I can more easily write
unit tests around my log parser and get that testing out of the way before
actually spawning any jobs.


On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma <
Chhaya.Vishwakarma@lntinfotech.com> wrote:

> hi,
>
> I have a log file in HDFS which needs to be parsed and put in a Hbase
> table.
>
> I want to do this using PIG .
>
> How can i go about it .Pig script should parse the logs and then put in
> Hbase?
>
>
> Regards,
> Chhaya Vishwakarma
>
>
> ________________________________
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>