You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by "Hamza Zafar (JIRA)" <ji...@apache.org> on 2015/04/21 11:27:58 UTC
[jira] [Issue Comment Deleted] (VXQUERY-131) Supporting Hadoop data and cluster management

     [ https://issues.apache.org/jira/browse/VXQUERY-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hamza Zafar updated VXQUERY-131:
--------------------------------
    Comment: was deleted

(was: Dear Preston,

My Background:
I am Hamza Zafar, a final year undergrad student of computer sciences at NUST, Pakistan. I have been a student researcher at HPC research center at my department. At HPC lab we are focused on developing and maintaining  an open source Java based MPI MPJ-Express http://mpj-express.org/

Open-source Contributions:
Considering my final year project, I worked on Apache Hadoop and MPJ-Express project. The project requires writing a new Runtime for MPJ Express, to bootstrap its processes on Hadoop YARN cluster. The new
Runtime for MPJ Express will utilize the Hadoop YARN resource manager to dynamically allocate resources in terms of memory and CPU. As much of the enterprise data now resides on Hadoop Distributed File System (HDFS), this project will enable enterprise to achieve the performance of HPC and the usability and flexibility of Big Data stack. The development part of MPJ Express YARN runtime is completed, currently I am working on releasing the software in the next few weeks. A research paper is currently under review at ICCS 

My Thoughts about the VXQuery and YARN project:
I did not have any past experience working with VXQuery project (I hope to learn it). I am comfortable writing the YARN applications. I anticipate that this project is geared towards replacing the python scripts to launch VXQuery jobs with the YARN resource manager. YARN can help spawn containers in the cluster, containers can then run the Queries on XML data files residing in HDFS. The Application Master can be very handy to reschedule the failed containers and maintain the running ones. 

Looking forward to work on this project :)

Yours Sincerely
Hamza Zafar
LinkedIn:  pk.linkedin.com/pub/hamza-zafar/59/739/205/ 
)

> Supporting Hadoop data and cluster management
> ---------------------------------------------
>
>                 Key: VXQUERY-131
>                 URL: https://issues.apache.org/jira/browse/VXQUERY-131
>             Project: VXQuery
>          Issue Type: Improvement
>            Reporter: Preston Carman
>            Assignee: Preston Carman
>              Labels: gsoc, gsoc2015, hadoop, java, mentor, xml
>
> Many organizations support Hadoop. It would be nice to be able to read data from this source. The project will include creating a strategy (with the mentor's guidance) for reading XML data from HDFS and implementing it. When connecting VXQuery to HDFS, the strategy may need to consider how to read sections of an XML file. 
> In addition, we could use Yarn as our cluster manager. The Apache Hadoop YARN (Yet Another Resource Negotiator) would be a good cluster management tool for VXQuery. If VXQuery can read data from HDFS, then why not also manage the cluster with a tool provided by Hadoop. The solution would replace the current custom python scripts for cluster management.
> Goal
> - Read XML from HDFS
> - Manage the VXQuery cluster with Yarn



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)