You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Necati Batur <ne...@gmail.com> on 2010/04/09 17:43:42 UTC

GSOC Create Sql adapters proposal

Hi,

Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all
the algorithms to use;

Necati Batur ; necatibatur@gmail.com



Mahout / Mahout - 332 : Assigned Mentor is Robin Anil



Proposal Abstract:

It would be useful to use thrift as the protocol with the noSQL systems, as
opposed to the native API of them so that a nice abstraction could be made
for all the NoSQL systems in general and specific thrift client
implementations added to maximize code re-use. Even if someone were to make
the port for 1 NoSQL client, having the demarcation would help to pick up
and port.

Detailed Description:

The data adapters fort he higher level languages will require the good
capability of using data structures and some information about finite
mathematics that I am confident on that issues.Then,the code given in svn
repository seems to need some improvements and also documetation.

Briefly,I would do the following operations fort his project



   1. Understand the underlying maths for adapters
   2. Determine the data structures that would be used for adapters
   3. Implement the neccassary methods to handle creation of these
   structures
   4. Some test cases that we probably would need to check whether our code
   cover all the issues required by a data retrieve operations
   5. New iterations on the code to robust the algorithms
   6. Documentation of overall project to join our particular Project to
   overall scope

Additional Information:

First of all,I am very excited to join an organization like GSOC and most
importantly work for a big open source Project apache.I am looking for a
good collaboration and new challenges on software development.Especially
information management issues sound great to me.I am confident to work with
all new technologies.I took the data structures I , II courses at university
so I am ok with data structures.Most importantly I am interested in
databases.From my software engineering courses experience I know how to work
on a project by iterative development and timelining