You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2011/02/28 22:01:52 UTC

Advice for a new open-source project and a license

Hi,

I am working on an open-source project that would be using
Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
searchable. Like Nutch, only applied to hard drives, and like Google Desktop
Search, only I want to output information about every file found. Not a big
difference though.

I am looking for an advice on the following

   1. Have you heard of a similar project?
   2. What license should I use? I am thinking of Apache V2.0, because it
   relies on other Apache V2.0 projects;
   3. Any other advice?

Thank you. Sincerely,
Mark

Re: Advice for a new open-source project and a license

Posted by Mark Kerzner <ma...@gmail.com>.
Thank you, Ted, indeed, not exactly what I am doing - which is even more
encouraging. Solr may be used for making the results easily available, so
thank you for pointing that out.

Cheers,
Mark

On Mon, Feb 28, 2011 at 5:28 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> Check out http://www.elasticsearch.org/
>
> <http://www.elasticsearch.org/>Not what you are doing, but possibly a
> helpful bit of the pie.
>
> Also, Solr integrates Tika and Lucene pretty nicely any more.  No Hbase
> yet, but it isn't hard to add that.
>
> On Mon, Feb 28, 2011 at 1:01 PM, Mark Kerzner <ma...@gmail.com>wrote:
>
>> Hi,
>>
>> I am working on an open-source project that would be using
>> Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
>> searchable. Like Nutch, only applied to hard drives, and like Google
>> Desktop
>> Search, only I want to output information about every file found. Not a
>> big
>> difference though.
>>
>> I am looking for an advice on the following
>>
>>   1. Have you heard of a similar project?
>>   2. What license should I use? I am thinking of Apache V2.0, because it
>>
>>   relies on other Apache V2.0 projects;
>>   3. Any other advice?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Advice for a new open-source project and a license

Posted by Ted Dunning <td...@maprtech.com>.
Check out http://www.elasticsearch.org/

<http://www.elasticsearch.org/>Not what you are doing, but possibly a
helpful bit of the pie.

Also, Solr integrates Tika and Lucene pretty nicely any more.  No Hbase yet,
but it isn't hard to add that.

On Mon, Feb 28, 2011 at 1:01 PM, Mark Kerzner <ma...@gmail.com> wrote:

> Hi,
>
> I am working on an open-source project that would be using
> Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
> searchable. Like Nutch, only applied to hard drives, and like Google
> Desktop
> Search, only I want to output information about every file found. Not a big
> difference though.
>
> I am looking for an advice on the following
>
>   1. Have you heard of a similar project?
>   2. What license should I use? I am thinking of Apache V2.0, because it
>   relies on other Apache V2.0 projects;
>   3. Any other advice?
>
> Thank you. Sincerely,
> Mark
>

Re: Advice for a new open-source project and a license

Posted by Mark Kerzner <ma...@gmail.com>.
Bixo looks nice, if not for this, then for other projects :)  - and no less
than by Ken Krugler - it must be good!

Why is there no license on the project in GitHub, and what is the current
level of activity?

Thank you,
Mark

On Tue, Mar 1, 2011 at 2:38 AM, Ted Dunning <td...@maprtech.com> wrote:

> Bixo may have some useful components.  The thrust is different, but some of
> the pieces are similar.
>
> http://bixo.101tec.com/
>
>
> On Mon, Feb 28, 2011 at 7:57 PM, Mark Kerzner <ma...@gmail.com>wrote:
>
>> Well, it's more complex than that. I packed all files (or selected
>> directories) into zip files, and those zip files go into HDFS, and they
>> are
>> processed from there.
>>
>> Mark
>>
>> On Mon, Feb 28, 2011 at 9:53 PM, Greg Roelofs <ro...@yahoo-inc.com>
>> wrote:
>>
>> > Mark Kerzner <ma...@gmail.com> wrote:
>> >
>> > > I am working on an open-source project that would be using
>> > > Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
>> > > searchable.
>> >
>> > _A_ hard drive?  Hadoop?  Seems like a bad match.
>> >
>> > Greg
>> >
>>
>
>

Re: Advice for a new open-source project and a license

Posted by Ted Dunning <td...@maprtech.com>.
Bixo may have some useful components.  The thrust is different, but some of
the pieces are similar.

http://bixo.101tec.com/

On Mon, Feb 28, 2011 at 7:57 PM, Mark Kerzner <ma...@gmail.com> wrote:

> Well, it's more complex than that. I packed all files (or selected
> directories) into zip files, and those zip files go into HDFS, and they are
> processed from there.
>
> Mark
>
> On Mon, Feb 28, 2011 at 9:53 PM, Greg Roelofs <ro...@yahoo-inc.com>
> wrote:
>
> > Mark Kerzner <ma...@gmail.com> wrote:
> >
> > > I am working on an open-source project that would be using
> > > Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
> > > searchable.
> >
> > _A_ hard drive?  Hadoop?  Seems like a bad match.
> >
> > Greg
> >
>

Re: Advice for a new open-source project and a license

Posted by Mark Kerzner <ma...@gmail.com>.
Well, it's more complex than that. I packed all files (or selected
directories) into zip files, and those zip files go into HDFS, and they are
processed from there.

Mark

On Mon, Feb 28, 2011 at 9:53 PM, Greg Roelofs <ro...@yahoo-inc.com> wrote:

> Mark Kerzner <ma...@gmail.com> wrote:
>
> > I am working on an open-source project that would be using
> > Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
> > searchable.
>
> _A_ hard drive?  Hadoop?  Seems like a bad match.
>
> Greg
>

Re: Advice for a new open-source project and a license

Posted by Greg Roelofs <ro...@yahoo-inc.com>.
Mark Kerzner <ma...@gmail.com> wrote:

> I am working on an open-source project that would be using
> Hadoop/HDFS/HBase/Tika/Lucene and would make all files on a hard drive
> searchable.

_A_ hard drive?  Hadoop?  Seems like a bad match.

Greg