You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sh...@cognizant.com on 2012/05/18 21:11:12 UTC
Splunk + Hadoop
Hi ,
Has anyone used Hadoop and splunk, or any other real-time processing tool over Hadoop?
Regards,
Shreya
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.
Re: Splunk + Hadoop
Posted by Edward Capriolo <ed...@gmail.com>.
So a while back their was an article:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
I recently did my own take on full text searching your logs with
solandra, though I have prototyped using solr inside datastax
enterprise as well.
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/more_taco_bell_programming_with
Splunk has a graphical front end with a good deal of sophistication,
but I am quite happy just being able to solr search everything, and
providing my own front ends built in solr.
On Mon, May 21, 2012 at 5:13 PM, Abhishek Pratap Singh
<ma...@gmail.com> wrote:
> I have used Hadoop and Splunk both. Can you please let me know what is your
> requirement?
> Real time processing with hadoop depends upon What defines "Real time" in
> particular scenario. Based on requirement, Real time (near real time) can
> be achieved.
>
> ~Abhishek
>
> On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:
>
>> Because that isn't Cube.
>>
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jurney@gmail.com
>> datasyndrome.com
>>
>> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
>> <ra...@gmail.com> wrote:
>>
>> > Why not Hbase with Hadoop?
>> > It's a best bet.
>> > Rgds, Ravi
>> >
>> > Sent from my Beethoven
>> >
>> >
>> > On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com>
>> wrote:
>> >
>> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
>> Cube to
>> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
>> tool...
>> >> but we'll be replaying events from the past. Does that count? It is
>> nice
>> >> to batch backfill metrics into 'real-time' systems in bulk.
>> >>
>> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
>> >>
>> >>> Hi ,
>> >>>
>> >>> Has anyone used Hadoop and splunk, or any other real-time processing
>> tool
>> >>> over Hadoop?
>> >>>
>> >>> Regards,
>> >>> Shreya
>> >>>
>> >>>
>> >>>
>> >>> This e-mail and any files transmitted with it are for the sole use of
>> the
>> >>> intended recipient(s) and may contain confidential and privileged
>> >>> information. If you are not the intended recipient(s), please reply to
>> the
>> >>> sender and destroy all copies of the original message. Any unauthorized
>> >>> review, use, disclosure, dissemination, forwarding, printing or
>> copying of
>> >>> this email, and/or any action taken in reliance on the contents of this
>> >>> e-mail is strictly prohibited and may be unlawful.
>> >>>
>> >>
>> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>>
Re: Splunk + Hadoop
Posted by Nitin Pawar <ni...@gmail.com>.
Hi Shreya,
if you are looking at data locality, then you may or may not use hadoop out
of the box.
It will all depend on how you design the data layout on top of hdfs and how
do you implement search based on the customer queries.
a good idea might be have hop-in queryable database like mysql inbetween
where you can store the results of your data being processed on hadoop and
then use solr search for fast access and search.
Thanks,
Nitin
On Mon, May 28, 2012 at 12:41 PM, <Sh...@cognizant.com> wrote:
> Hi Abhishek,
>
> I am looking for a scenario where the customer representative needs to
> respond back to the customers on call.
> They need to search on huge data and then respond back in few seconds.
>
> Thanks and Regards,
> Shreya Pal
> Architect Technology
> Cognizant Technology Pvt Ltd
> Vnet - 205594
> Mobile - +91-9766310680
>
>
> -----Original Message-----
> From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
> Sent: Tuesday, May 22, 2012 2:44 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Splunk + Hadoop
>
> I have used Hadoop and Splunk both. Can you please let me know what is
> your requirement?
> Real time processing with hadoop depends upon What defines "Real time" in
> particular scenario. Based on requirement, Real time (near real time) can
> be achieved.
>
> ~Abhishek
>
> On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > Because that isn't Cube.
> >
> > Russell Jurney
> > twitter.com/rjurney
> > russell.jurney@gmail.com
> > datasyndrome.com
> >
> > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> > <ra...@gmail.com> wrote:
> >
> > > Why not Hbase with Hadoop?
> > > It's a best bet.
> > > Rgds, Ravi
> > >
> > > Sent from my Beethoven
> > >
> > >
> > > On May 18, 2012, at 3:29 PM, Russell Jurney
> > > <ru...@gmail.com>
> > wrote:
> > >
> > >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> > Cube to
> > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> > tool...
> > >> but we'll be replaying events from the past. Does that count? It
> > >> is
> > nice
> > >> to batch backfill metrics into 'real-time' systems in bulk.
> > >>
> > >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> > >>
> > >>> Hi ,
> > >>>
> > >>> Has anyone used Hadoop and splunk, or any other real-time
> > >>> processing
> > tool
> > >>> over Hadoop?
> > >>>
> > >>> Regards,
> > >>> Shreya
> > >>>
> > >>>
> > >>>
> > >>> This e-mail and any files transmitted with it are for the sole use
> > >>> of
> > the
> > >>> intended recipient(s) and may contain confidential and privileged
> > >>> information. If you are not the intended recipient(s), please
> > >>> reply to
> > the
> > >>> sender and destroy all copies of the original message. Any
> > >>> unauthorized review, use, disclosure, dissemination, forwarding,
> > >>> printing or
> > copying of
> > >>> this email, and/or any action taken in reliance on the contents of
> > >>> this e-mail is strictly prohibited and may be unlawful.
> > >>>
> > >>
> > >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>
--
Nitin Pawar
RE: Splunk + Hadoop
Posted by Tom Deutsch <td...@us.ibm.com>.
Shreya - there are two major considerations here. First, can the system
process the required information, make it easily accessible, and do that
with the required accuracy for a user based search paradigm . Second, can
the system do that fast enough to meet the time window of the use case.
It is unclear what type/source of information needs to be processed and
then made available for retrieval, how long a search can take and still be
considered OK, or the total latency (not just retrieval during the search
phase) from information acquisition to being searchable. If you can share
those details the group can help provide more specific/better coaching.
------------------------------------------------
Tom Deutsch
Program Director
Information Management
Big Data Technologies
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeutsch@us.ibm.com
Twitter: @thomasdeutsch
Data Management Blog: ibmdatamag.com/author/tdeutsch/
LinkedIn: http://www.linkedin.com/profile/view?id=833160
Quora: http://www.quora.com/Tom-Deutsch
Smarter Computing Blog:
http://www.smartercomputingblog.com/contributorsprofile/?user_id=223
Big Data for Business Executives Group:
http://www.linkedin.com/groups?gid=4455695
From: <Sh...@cognizant.com>
To: <co...@hadoop.apache.org>,
Date: 05/28/2012 12:12 AM
Subject: RE: Splunk + Hadoop
Hi Abhishek,
I am looking for a scenario where the customer representative needs to
respond back to the customers on call.
They need to search on huge data and then respond back in few seconds.
Thanks and Regards,
Shreya Pal
Architect Technology
Cognizant Technology Pvt Ltd
Vnet - 205594
Mobile - +91-9766310680
-----Original Message-----
From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
Sent: Tuesday, May 22, 2012 2:44 AM
To: common-user@hadoop.apache.org
Subject: Re: Splunk + Hadoop
I have used Hadoop and Splunk both. Can you please let me know what is
your requirement?
Real time processing with hadoop depends upon What defines "Real time" in
particular scenario. Based on requirement, Real time (near real time) can
be achieved.
~Abhishek
On Fri, May 18, 2012 at 3:58 PM, Russell Jurney
<ru...@gmail.com>wrote:
> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney
> > <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past. Does that count? It
> >> is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time
> >>> processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use
> >>> of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please
> >>> reply to
> the
> >>> sender and destroy all copies of the original message. Any
> >>> unauthorized review, use, disclosure, dissemination, forwarding,
> >>> printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of
> >>> this e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. If you are not the intended recipient(s), please reply to the
sender and destroy all copies of the original message. Any unauthorized
review, use, disclosure, dissemination, forwarding, printing or copying of
this email, and/or any action taken in reliance on the contents of this
e-mail is strictly prohibited and may be unlawful.
RE: Splunk + Hadoop
Posted by Sh...@cognizant.com.
Hi Abhishek,
I am looking for a scenario where the customer representative needs to respond back to the customers on call.
They need to search on huge data and then respond back in few seconds.
Thanks and Regards,
Shreya Pal
Architect Technology
Cognizant Technology Pvt Ltd
Vnet - 205594
Mobile - +91-9766310680
-----Original Message-----
From: Abhishek Pratap Singh [mailto:manu.infy@gmail.com]
Sent: Tuesday, May 22, 2012 2:44 AM
To: common-user@hadoop.apache.org
Subject: Re: Splunk + Hadoop
I have used Hadoop and Splunk both. Can you please let me know what is your requirement?
Real time processing with hadoop depends upon What defines "Real time" in particular scenario. Based on requirement, Real time (near real time) can be achieved.
~Abhishek
On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:
> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney
> > <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past. Does that count? It
> >> is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time
> >>> processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use
> >>> of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please
> >>> reply to
> the
> >>> sender and destroy all copies of the original message. Any
> >>> unauthorized review, use, disclosure, dissemination, forwarding,
> >>> printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of
> >>> this e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.
Re: Splunk + Hadoop
Posted by Abhishek Pratap Singh <ma...@gmail.com>.
I have used Hadoop and Splunk both. Can you please let me know what is your
requirement?
Real time processing with hadoop depends upon What defines "Real time" in
particular scenario. Based on requirement, Real time (near real time) can
be achieved.
~Abhishek
On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <ru...@gmail.com>wrote:
> Because that isn't Cube.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
> <ra...@gmail.com> wrote:
>
> > Why not Hbase with Hadoop?
> > It's a best bet.
> > Rgds, Ravi
> >
> > Sent from my Beethoven
> >
> >
> > On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com>
> wrote:
> >
> >> I'm playing with using Hadoop and Pig to load MongoDB with data for
> Cube to
> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime
> tool...
> >> but we'll be replaying events from the past. Does that count? It is
> nice
> >> to batch backfill metrics into 'real-time' systems in bulk.
> >>
> >> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> >>
> >>> Hi ,
> >>>
> >>> Has anyone used Hadoop and splunk, or any other real-time processing
> tool
> >>> over Hadoop?
> >>>
> >>> Regards,
> >>> Shreya
> >>>
> >>>
> >>>
> >>> This e-mail and any files transmitted with it are for the sole use of
> the
> >>> intended recipient(s) and may contain confidential and privileged
> >>> information. If you are not the intended recipient(s), please reply to
> the
> >>> sender and destroy all copies of the original message. Any unauthorized
> >>> review, use, disclosure, dissemination, forwarding, printing or
> copying of
> >>> this email, and/or any action taken in reliance on the contents of this
> >>> e-mail is strictly prohibited and may be unlawful.
> >>>
> >>
> >> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
Re: Splunk + Hadoop
Posted by Russell Jurney <ru...@gmail.com>.
Because that isn't Cube.
Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com
On May 18, 2012, at 2:01 PM, Ravi Shankar Nair
<ra...@gmail.com> wrote:
> Why not Hbase with Hadoop?
> It's a best bet.
> Rgds, Ravi
>
> Sent from my Beethoven
>
>
> On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com> wrote:
>
>> I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
>> consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
>> but we'll be replaying events from the past. Does that count? It is nice
>> to batch backfill metrics into 'real-time' systems in bulk.
>>
>> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
>>
>>> Hi ,
>>>
>>> Has anyone used Hadoop and splunk, or any other real-time processing tool
>>> over Hadoop?
>>>
>>> Regards,
>>> Shreya
>>>
>>>
>>>
>>> This e-mail and any files transmitted with it are for the sole use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. If you are not the intended recipient(s), please reply to the
>>> sender and destroy all copies of the original message. Any unauthorized
>>> review, use, disclosure, dissemination, forwarding, printing or copying of
>>> this email, and/or any action taken in reliance on the contents of this
>>> e-mail is strictly prohibited and may be unlawful.
>>>
>>
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Re: Splunk + Hadoop
Posted by Ravi Shankar Nair <ra...@gmail.com>.
Why not Hbase with Hadoop?
It's a best bet.
Rgds, Ravi
Sent from my Beethoven
On May 18, 2012, at 3:29 PM, Russell Jurney <ru...@gmail.com> wrote:
> I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
> consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
> but we'll be replaying events from the past. Does that count? It is nice
> to batch backfill metrics into 'real-time' systems in bulk.
>
> On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
>
>> Hi ,
>>
>> Has anyone used Hadoop and splunk, or any other real-time processing tool
>> over Hadoop?
>>
>> Regards,
>> Shreya
>>
>>
>>
>> This e-mail and any files transmitted with it are for the sole use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Re: Splunk + Hadoop
Posted by Russell Jurney <ru...@gmail.com>.
I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
consume. Cube <https://github.com/square/cube/wiki> is a realtime tool...
but we'll be replaying events from the past. Does that count? It is nice
to batch backfill metrics into 'real-time' systems in bulk.
On Fri, May 18, 2012 at 12:11 PM, <Sh...@cognizant.com> wrote:
> Hi ,
>
> Has anyone used Hadoop and splunk, or any other real-time processing tool
> over Hadoop?
>
> Regards,
> Shreya
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com