You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sh...@cognizant.com on 2012/05/18 21:11:12 UTC

Splunk + Hadoop

Hi ,

Has anyone used Hadoop and splunk, or any other real-time processing tool over Hadoop?

Regards,
Shreya



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

Re: Splunk + Hadoop

Posted by Edward Capriolo <ed...@gmail.com>.

So a while back their was an article:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

I recently did my own take on full text searching your logs with
solandra, though I have prototyped using solr inside datastax
enterprise as well.

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/more_taco_bell_programming_with

Splunk has a graphical front end with a good deal of sophistication,
but I am quite happy just being able to solr search everything, and
providing my own front ends built in solr.

On Mon, May 21, 2012 at 5:13 PM, Abhishek Pratap Singh
<ma...@gmail.com> wrote:
> I have used Hadoop and Splunk both. Can you please let me know what is your
> requirement?
> Real time processing with hadoop depends upon What defines "Real time" in
> particular scenario. Based on requirement, Real time (near real time) can
> be achieved.
>
> ~Abhishek
>
> On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:
>
>> Because that isn't Cube.
>>
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jurney@gmail.com
>> datasyndrome.com
>>
>> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
>> <ra...@gmail.com> wrote:
>>
>> > Why not Hbase with Hadoop?
>> > It's a best bet.
>> > Rgds, Ravi
>> >
>> > Sent from my Beethoven
>> >
>> >
>> > On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com>
>> wrote:
>> >
>> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
>> Cube to
>> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
>> tool...
>> >> but we'll be replaying events from the past.  Does that count?  It is
>> nice
>> >> to batch backfill metrics into 'real-time' systems in bulk.
>> >>
>> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
>> >>
>> >>> Hi ,
>> >>>
>> >>> Has anyone used Hadoop and splunk, or any other real-time processing
>> tool
>> >>> over Hadoop?
>> >>>
>> >>> Regards,
>> >>> Shreya
>> >>>
>> >>>
>> >>>
>> >>> This e-mail and any files transmitted with it are for the sole use of
>> the
>> >>> intended recipient(s) and may contain confidential and privileged
>> >>> information. If you are not the intended recipient(s), please reply to
>> the
>> >>> sender and destroy all copies of the original message. Any unauthorized
>> >>> review, use, disclosure, dissemination, forwarding, printing or
>> copying of
>> >>> this email, and/or any action taken in reliance on the contents of this
>> >>> e-mail is strictly prohibited and may be unlawful.
>> >>>
>> >>
>> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>>

Re: Splunk + Hadoop

Posted by Nitin Pawar <ni...@gmail.com>.

Hi Shreya,

if you are looking at data locality, then you may or may not use hadoop out
of the box.
It will all depend on how you design the data layout on top of hdfs and how
do you implement search based on the customer queries.

a good idea might be have hop-in queryable database like mysql inbetween
where you can store the results of your data being processed on hadoop and
then use solr search for fast access and search.

Thanks,
Nitin

On Mon, May 28, 2012 at 12:41 PM, <Sh...@cognizant.com> wrote:

> Hi Abhishek,
>
> I am looking for a scenario where the customer representative needs to
> respond back to the customers on call.
> They need to search on huge data and then respond back in few seconds.
>
> Thanks and Regards,
> Shreya Pal
> Architect Technology
> Cognizant Technology Pvt Ltd
> Vnet - 205594
> Mobile - +91-9766310680
>
>
> -----Original Message-----
> From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
> Sent: Tuesday, May 22, 2012 2:44 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Splunk + Hadoop
>
> I have used Hadoop and Splunk both. Can you please let me know what is
> your requirement?
> Real time processing with hadoop depends upon What defines "Real time" in
> particular scenario. Based on requirement, Real time (near real time) can
> be achieved.
>
> ~Abhishek
>
> On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Because that isn't Cube.
> >
> > Russell Jurney
> > twitter.com/rjurney
> > russell.jurney@gmail.com
> > datasyndrome.com
> >
> > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> > <ra...@gmail.com> wrote:
> >
> > > Why not Hbase with Hadoop?
> > > It's a best bet.
> > > Rgds, Ravi
> > >
> > > Sent from my Beethoven
> > >
> > >
> > > On May 18, 2012, at 3:29 PM, Russell Jurney
> > > <ru...@gmail.com>
> > wrote:
> > >
> > >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> > Cube to
> > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> > tool...
> > >> but we'll be replaying events from the past.  Does that count?  It
> > >> is
> > nice
> > >> to batch backfill metrics into 'real-time' systems in bulk.
> > >>
> > >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> > >>
> > >>> Hi ,
> > >>>
> > >>> Has anyone used Hadoop and splunk, or any other real-time
> > >>> processing
> > tool
> > >>> over Hadoop?
> > >>>
> > >>> Regards,
> > >>> Shreya
> > >>>
> > >>>
> > >>>
> > >>> This e-mail and any files transmitted with it are for the sole use
> > >>> of
> > the
> > >>> intended recipient(s) and may contain confidential and privileged
> > >>> information. If you are not the intended recipient(s), please
> > >>> reply to
> > the
> > >>> sender and destroy all copies of the original message. Any
> > >>> unauthorized review, use, disclosure, dissemination, forwarding,
> > >>> printing or
> > copying of
> > >>> this email, and/or any action taken in reliance on the contents of
> > >>> this e-mail is strictly prohibited and may be unlawful.
> > >>>
> > >>
> > >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>



-- 
Nitin Pawar

RE: Splunk + Hadoop

Posted by Tom Deutsch <td...@us.ibm.com>.

Shreya - there are two major considerations here. First, can the system 
process the required information, make it easily accessible, and do that 
with the required accuracy for a user based search paradigm . Second, can 
the system do that fast enough to meet the time window of the use case. 

It is unclear what type/source of information needs to be processed and 
then made available for retrieval, how long a search can take and still be 
considered OK, or the total latency (not just retrieval during the search 
phase) from information acquisition to being searchable. If you can share 
those details the group can help provide more specific/better coaching.

------------------------------------------------
Tom Deutsch
Program Director
Information Management
Big Data Technologies
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeutsch@us.ibm.com

Twitter: @thomasdeutsch
Data Management Blog: ibmdatamag.com/author/tdeutsch/
LinkedIn: http://www.linkedin.com/profile/view?id=833160
Quora: http://www.quora.com/Tom-Deutsch
Smarter Computing Blog: 
http://www.smartercomputingblog.com/contributorsprofile/?user_id=223
Big Data for Business Executives Group: 
http://www.linkedin.com/groups?gid=4455695




From:   <Sh...@cognizant.com>
To:     <co...@hadoop.apache.org>, 
Date:   05/28/2012 12:12 AM
Subject:        RE: Splunk + Hadoop



Hi Abhishek,

I am looking for a scenario where the customer representative needs to 
respond back to the customers on call.
They need to search on huge data and then respond back in few seconds.

Thanks and Regards,
Shreya Pal
Architect Technology
Cognizant Technology Pvt Ltd
Vnet - 205594
Mobile - +91-9766310680


-----Original Message-----
From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
Sent: Tuesday, May 22, 2012 2:44 AM
To: common-user@hadoop.apache.org
Subject: Re: Splunk + Hadoop

I have used Hadoop and Splunk both. Can you please let me know what is 
your requirement?
Real time processing with hadoop depends upon What defines "Real time" in 
particular scenario. Based on requirement, Real time (near real time) can 
be achieved.

~Abhishek

On Fri, May 18, 2012 at 3:58 PM, Russell Jurney 
<ru...@gmail.com>wrote:

> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney
> > <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past.  Does that count?  It
> >> is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time
> >>> processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use
> >>> of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please
> >>> reply to
> the
> >>> sender and destroy all copies of the original message. Any
> >>> unauthorized review, use, disclosure, dissemination, forwarding,
> >>> printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of
> >>> this e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized 
review, use, disclosure, dissemination, forwarding, printing or copying of 
this email, and/or any action taken in reliance on the contents of this 
e-mail is strictly prohibited and may be unlawful.

RE: Splunk + Hadoop

Posted by Sh...@cognizant.com.

Hi Abhishek,

I am looking for a scenario where the customer representative needs to respond back to the customers on call.
They need to search on huge data and then respond back in few seconds.

Thanks and Regards,
Shreya Pal
Architect Technology
Cognizant Technology Pvt Ltd
Vnet - 205594
Mobile - +91-9766310680


-----Original Message-----
From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
Sent: Tuesday, May 22, 2012 2:44 AM
To: common-user@hadoop.apache.org
Subject: Re: Splunk + Hadoop

I have used Hadoop and Splunk both. Can you please let me know what is your requirement?
Real time processing with hadoop depends upon What defines "Real time" in particular scenario. Based on requirement, Real time (near real time) can be achieved.

~Abhishek

On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:

> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney
> > <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past.  Does that count?  It
> >> is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time
> >>> processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use
> >>> of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please
> >>> reply to
> the
> >>> sender and destroy all copies of the original message. Any
> >>> unauthorized review, use, disclosure, dissemination, forwarding,
> >>> printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of
> >>> this e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.

Re: Splunk + Hadoop

Posted by Abhishek Pratap Singh <ma...@gmail.com>.

I have used Hadoop and Splunk both. Can you please let me know what is your
requirement?
Real time processing with hadoop depends upon What defines "Real time" in
particular scenario. Based on requirement, Real time (near real time) can
be achieved.

~Abhishek

On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:

> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past.  Does that count?  It is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please reply to
> the
> >>> sender and destroy all copies of the original message. Any unauthorized
> >>> review, use, disclosure, dissemination, forwarding, printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of this
> >>> e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: Splunk + Hadoop

Posted by Russell Jurney <ru...@gmail.com>.

Because that isn't Cube.

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
<ra...@gmail.com> wrote:

> Why not Hbase with Hadoop?
> It's a best bet.
> Rgds, Ravi
>
> Sent from my Beethoven
>
>
> On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
>> consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
>> but we'll be replaying events from the past.  Does that count?  It is nice
>> to batch backfill metrics into 'real-time' systems in bulk.
>>
>> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
>>
>>> Hi ,
>>>
>>> Has anyone used Hadoop and splunk, or any other real-time processing tool
>>> over Hadoop?
>>>
>>> Regards,
>>> Shreya
>>>
>>>
>>>
>>> This e-mail and any files transmitted with it are for the sole use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>>
>>
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Splunk + Hadoop

Posted by Ravi Shankar Nair <ra...@gmail.com>.

Why not Hbase with Hadoop?
It's a best bet.
Rgds, Ravi

Sent from my Beethoven 


On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com> wrote:

> I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
> consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
> but we'll be replaying events from the past.  Does that count?  It is nice
> to batch backfill metrics into 'real-time' systems in bulk.
> 
> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> 
>> Hi ,
>> 
>> Has anyone used Hadoop and splunk, or any other real-time processing tool
>> over Hadoop?
>> 
>> Regards,
>> Shreya
>> 
>> 
>> 
>> This e-mail and any files transmitted with it are for the sole use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>> 
> 
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Splunk + Hadoop

Posted by Russell Jurney <ru...@gmail.com>.

I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
but we'll be replaying events from the past.  Does that count?  It is nice
to batch backfill metrics into 'real-time' systems in bulk.

On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:

> Hi ,
>
> Has anyone used Hadoop and splunk, or any other real-time processing tool
> over Hadoop?
>
> Regards,
> Shreya
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>

Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com