You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by gs...@tce.edu on 2010/02/26 07:52:37 UTC

How Hadoop Works??

hi all
    when studying how hadoop framework works i have noticed that
map reduce in turn uses apache lucene for creating index for scheduled
new data and solr for creating instances. Is that right???
thanks
sujitha




-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India

RE: Map-Reduce for Security

Posted by "Segel, Mike" <ms...@navteq.com>.

Suji,

I'm only a couple of months in to using Map/Reduce, but I think you have a couple of issues.
What do you mean by 'security'? ('Security' can mean different things to different people.)

Map/Reduce works at the 'row' level.
So if you were to encrypt the data, you'd have to encrypt it on a row by row basis and you'd have to work out how you recognize the end of each 'row'.

Then in the map() method, you'd have to decrypt, do something, and encrypt the data output.
In your reduce() method, you'd have to decrypt, do something and then encrypt the data output.

This will secure your data. How well will be determined by how you design your map reduce jobs.

HTH

-Mike


-----Original Message-----
From: Sujitha [mailto:gscse@tce.edu] 
Sent: Monday, March 01, 2010 1:15 AM
To: common-dev@hadoop.apache.org
Subject: Map-Reduce for Security


Hi all,

In order to secure the content of HDFS is it right encrypting and
compressing data before storing it.

Is there any way to use map reduce for securing data in hadoop?

thanks

Regards
Suji




-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India



The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above.  If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited.  If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.

Map-Reduce for Security

Posted by Sujitha <gs...@tce.edu>.

Hi all,

In order to secure the content of HDFS is it right encrypting and
compressing data before storing it.

Is there any way to use map reduce for securing data in hadoop?

thanks

Regards
Suji




-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India

Re: How Hadoop Works??

Posted by Lukáš Vlček <lu...@gmail.com>.

BTW you can also take a look at this book (complete draft of all chapters
available for download for free)
http://www.umiacs.umd.edu/~jimmylin/book.html

Lukas

On Fri, Feb 26, 2010 at 1:30 PM, Lukáš Vlček <lu...@gmail.com> wrote:

> Hi,
>
> if you are serious about Hadoop then I can warmly recommend book by Tom
> White: http://www.hadoopbook.com/
> (Disclaimer: I am not paid for this commercial, I do it just because I
> found Tom's book valuable and worth buying.)
>
> Regards,
> Lukas
>
>
> On Fri, Feb 26, 2010 at 8:55 AM, Sujitha <gs...@tce.edu> wrote:
>
>>
>> > No, Hadoop do not use Lucene.
>>
>> have studied like
>>
>> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>>
>>  ---Given that.....
>>
>>  The way the current Hadoop based system works is:
>>  Raw logs get streamed from hundreds of mail servers to the Hadoop
>>  Distributed File System (”HDFS”) in real time.
>>  MapReduce jobs are scheduled run to index the new data using Apache
>> Lucene
>>  and Solr.
>>  Once the indexes have been built, they are compressed and stored away in
>>  HDFS.
>>  Each Hadoop datanode runs a Tomcat servlet container, which hosts a
>> number
>>  of Solr instances that pull and merge the new indexes, and provide really
>>  fast search results to our support team.
>>
>> > And do you mean slor combine the Lucene and Hadoop ?
>>
>> No..Is that Solr (search server) uses Lucene (has library) that supports
>> the search..Solr needs Lucene to perform full-text indexing and searching
>> etc.,am i right??
>>
>>
>> >
>> >
>> >
>> > On Fri, Feb 26, 2010 at 2:52 PM, <gs...@tce.edu> wrote:
>> >
>> >> hi all
>> >>    when studying how hadoop framework works i have noticed that
>> >> map reduce in turn uses apache lucene for creating index for scheduled
>> >> new data and solr for creating instances. Is that right???
>> >> thanks
>> >> sujitha
>> >>
>> >>
>> >>
>> >>
>> >> -----------------------------------------
>> >> This email was sent using TCEMail Service.
>> >> Thiagarajar College of Engineering
>> >> Madurai-625 015, India
>> >>
>> >>
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>>
>>
>> --
>> Suji
>>
>>
>> -----------------------------------------
>> This email was sent using TCEMail Service.
>> Thiagarajar College of Engineering
>> Madurai-625 015, India
>>
>>
>

Re: How Hadoop Works??

Posted by Lukáš Vlček <lu...@gmail.com>.

Hi,

if you are serious about Hadoop then I can warmly recommend book by Tom
White: http://www.hadoopbook.com/
(Disclaimer: I am not paid for this commercial, I do it just because I found
Tom's book valuable and worth buying.)

Regards,
Lukas

On Fri, Feb 26, 2010 at 8:55 AM, Sujitha <gs...@tce.edu> wrote:

>
> > No, Hadoop do not use Lucene.
>
> have studied like
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
>  ---Given that.....
>
>  The way the current Hadoop based system works is:
>  Raw logs get streamed from hundreds of mail servers to the Hadoop
>  Distributed File System (”HDFS”) in real time.
>  MapReduce jobs are scheduled run to index the new data using Apache Lucene
>  and Solr.
>  Once the indexes have been built, they are compressed and stored away in
>  HDFS.
>  Each Hadoop datanode runs a Tomcat servlet container, which hosts a number
>  of Solr instances that pull and merge the new indexes, and provide really
>  fast search results to our support team.
>
> > And do you mean slor combine the Lucene and Hadoop ?
>
> No..Is that Solr (search server) uses Lucene (has library) that supports
> the search..Solr needs Lucene to perform full-text indexing and searching
> etc.,am i right??
>
>
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:52 PM, <gs...@tce.edu> wrote:
> >
> >> hi all
> >>    when studying how hadoop framework works i have noticed that
> >> map reduce in turn uses apache lucene for creating index for scheduled
> >> new data and solr for creating instances. Is that right???
> >> thanks
> >> sujitha
> >>
> >>
> >>
> >>
> >> -----------------------------------------
> >> This email was sent using TCEMail Service.
> >> Thiagarajar College of Engineering
> >> Madurai-625 015, India
> >>
> >>
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>
> --
> Suji
>
>
> -----------------------------------------
> This email was sent using TCEMail Service.
> Thiagarajar College of Engineering
> Madurai-625 015, India
>
>

Re: Terminal Level Authentication

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Oct 13, 2010, at 4:00 AM, Sujitha wrote:

> 
> Hi all,
>        As a part of my research i am trying to authenticate users.For
> this i created a Browser Level Kerberos authentication.

	SPNEGO or something custom?


> After that
> i have identified issues related to cookies on the browser side.
>        So shall i move towards terminal level authentication??
> Suggestions Please.

kinit (username) will generate a TGT for a user.

Terminal Level Authentication

Posted by Sujitha <gs...@tce.edu>.

Hi all,
        As a part of my research i am trying to authenticate users.For
this i created a Browser Level Kerberos authentication.After that
i have identified issues related to cookies on the browser side.
        So shall i move towards terminal level authentication??
Suggestions Please.
Sujitha



-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India

Re: How Hadoop Works??

Posted by Sujitha <gs...@tce.edu>.

> No, Hadoop do not use Lucene.

have studied like
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

 ---Given that.....

 The way the current Hadoop based system works is:
 Raw logs get streamed from hundreds of mail servers to the Hadoop
 Distributed File System (HDFS) in real time.
 MapReduce jobs are scheduled run to index the new data using Apache Lucene
 and Solr.
 Once the indexes have been built, they are compressed and stored away in
 HDFS.
 Each Hadoop datanode runs a Tomcat servlet container, which hosts a number
 of Solr instances that pull and merge the new indexes, and provide really
 fast search results to our support team.

> And do you mean slor combine the Lucene and Hadoop ?

No..Is that Solr (search server) uses Lucene (has library) that supports
the search..Solr needs Lucene to perform full-text indexing and searching
etc.,am i right??

>
>
>
> On Fri, Feb 26, 2010 at 2:52 PM, <gs...@tce.edu> wrote:
>
>> hi all
>>    when studying how hadoop framework works i have noticed that
>> map reduce in turn uses apache lucene for creating index for scheduled
>> new data and solr for creating instances. Is that right???
>> thanks
>> sujitha
>>
>>
>>
>>
>> -----------------------------------------
>> This email was sent using TCEMail Service.
>> Thiagarajar College of Engineering
>> Madurai-625 015, India
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

-- 
Suji

-----------------------------------------
This email was sent using TCEMail Service.
Thiagarajar College of Engineering
Madurai-625 015, India

Re: How Hadoop Works??

Posted by Jeff Zhang <zj...@gmail.com>.

No, Hadoop do not use Lucene.

And do you mean slor combine the Lucene and Hadoop ?



On Fri, Feb 26, 2010 at 2:52 PM, <gs...@tce.edu> wrote:

> hi all
>    when studying how hadoop framework works i have noticed that
> map reduce in turn uses apache lucene for creating index for scheduled
> new data and solr for creating instances. Is that right???
> thanks
> sujitha
>
>
>
>
> -----------------------------------------
> This email was sent using TCEMail Service.
> Thiagarajar College of Engineering
> Madurai-625 015, India
>
>


-- 
Best Regards

Jeff Zhang