You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vxquery.apache.org by Eldon Carman <ec...@ucr.edu> on 2015/09/11 23:49:28 UTC

HDFS Status

I wanted to check in and see how the HDFS work is going. I see there are
two pull requests up on github, but they are almost a month old now.

First, have the reviewers comments been addressed? When do you think we can
merge the HDFS read code (with documentation) into VXQuery?

Second, how is the progress going with YARN? As we get closer to doing a
large scale test, I would like to have all the HDFS and YARN code in master
before starting the test. Soon after the test, we can do a big release.

Thanks for the status update.
Preston

Re: HDFS Status

Posted by Efi <ef...@gmail.com>.
Hello Preston and everyone,

Exams are going well so I finally got to look on the diagram, as we had 
discussed this is exactly what yarn cluster management should do.It 
could be done with slider, I believe after looking at the HBase example 
Eldon sent to me.

About the merge of the pull requests, everything is ready as we have 
discussed them, with only one change, the user will define the hdfs 
paths as hdfs://user/hduser/xml ... but for the local filesystem he will 
not have to add any prefix like file:// to the path.That way he can 
still use relative paths with the local filesystem and not have to 
change all the queries that already exist to comply with the new rules.
I believe that this is the best solution because right now there are a 
lot of queries in the test suit that will need to change to match the 
new system if we added file:// in front of the local paths.

Please tell me what you think of that,

Best regards,
Efi

On 09/14/2015 08:37 PM, Preston Carman wrote:
> Hi Efi,
>
> Thanks for the update. Hope you do well on your exams. Its good to know
> your making progress. Also I have a new Yarn cluster diagram for you [1].
> The diagram has been updated since our last conversation. I keep coming
> back to the idea that the VXQuery cluster should not be any different for
> YARN or for a local configuration. I know we are using slider for managing
> the yarn cluster and wanted to get your feedback on this diagram. Is this
> possible? If not what are the complications?
>
> Pythons Scripts Workflow
> - Defined server configuration
> - Deploy VXQuery cluster
> - Start VXQuery cluster
> --- Run queries using the CLI
> - Stop VXQuery cluster
>
> Suggested YARN Workflow
> - Defined YARN configuration
> - Start YARN VXQuery cluster job
> --- Run queries using the CLI
> - Stop YARN VXQuery cluster job
>
> As you have worked with slider, is this a possible workflow? A similar
> workflow is used by AsterixDB in YARN, so I know its possible using a
> custom model. Thanks for your feedback.
>
> Preston
>
> [1]
> https://docs.google.com/drawings/d/1PZbvJk-G0J3hQffd-fFr2n893bXSNg3xfXFexM5c2A8/edit?usp=sharing
>
>
> On Sun, Sep 13, 2015 at 9:04 AM, Efi <ef...@gmail.com> wrote:
>
>> Hello Preston,
>>
>>      Sorry for the delay, about the github pull requests, there is one
>> issue left to be resolved, the custom system path that needs to be set in
>> the cluster configuration file during the maven build time.Working on it
>> right now and we scheduled a meeting with Steven for next week to merge the
>> pull request.
>>
>>      About Yarn I still haven't started working on it but I will after I am
>> done with the maven issue, although I have my university exams this month
>> so I do not have as much time as before.After I am done with exams, if it
>> is not resolved by then, I will have more time to work on it.
>>
>> Best regards,
>> Efi
>>
>>
>> On 09/12/2015 12:49 AM, Eldon Carman wrote:
>>
>>> I wanted to check in and see how the HDFS work is going. I see there are
>>> two pull requests up on github, but they are almost a month old now.
>>>
>>> First, have the reviewers comments been addressed? When do you think we
>>> can
>>> merge the HDFS read code (with documentation) into VXQuery?
>>>
>>> Second, how is the progress going with YARN? As we get closer to doing a
>>> large scale test, I would like to have all the HDFS and YARN code in
>>> master
>>> before starting the test. Soon after the test, we can do a big release.
>>>
>>> Thanks for the status update.
>>> Preston
>>>
>>>


Re: HDFS Status

Posted by Preston Carman <pr...@apache.org>.
Hi Efi,

Thanks for the update. Hope you do well on your exams. Its good to know
your making progress. Also I have a new Yarn cluster diagram for you [1].
The diagram has been updated since our last conversation. I keep coming
back to the idea that the VXQuery cluster should not be any different for
YARN or for a local configuration. I know we are using slider for managing
the yarn cluster and wanted to get your feedback on this diagram. Is this
possible? If not what are the complications?

Pythons Scripts Workflow
- Defined server configuration
- Deploy VXQuery cluster
- Start VXQuery cluster
--- Run queries using the CLI
- Stop VXQuery cluster

Suggested YARN Workflow
- Defined YARN configuration
- Start YARN VXQuery cluster job
--- Run queries using the CLI
- Stop YARN VXQuery cluster job

As you have worked with slider, is this a possible workflow? A similar
workflow is used by AsterixDB in YARN, so I know its possible using a
custom model. Thanks for your feedback.

Preston

[1]
https://docs.google.com/drawings/d/1PZbvJk-G0J3hQffd-fFr2n893bXSNg3xfXFexM5c2A8/edit?usp=sharing


On Sun, Sep 13, 2015 at 9:04 AM, Efi <ef...@gmail.com> wrote:

> Hello Preston,
>
>     Sorry for the delay, about the github pull requests, there is one
> issue left to be resolved, the custom system path that needs to be set in
> the cluster configuration file during the maven build time.Working on it
> right now and we scheduled a meeting with Steven for next week to merge the
> pull request.
>
>     About Yarn I still haven't started working on it but I will after I am
> done with the maven issue, although I have my university exams this month
> so I do not have as much time as before.After I am done with exams, if it
> is not resolved by then, I will have more time to work on it.
>
> Best regards,
> Efi
>
>
> On 09/12/2015 12:49 AM, Eldon Carman wrote:
>
>> I wanted to check in and see how the HDFS work is going. I see there are
>> two pull requests up on github, but they are almost a month old now.
>>
>> First, have the reviewers comments been addressed? When do you think we
>> can
>> merge the HDFS read code (with documentation) into VXQuery?
>>
>> Second, how is the progress going with YARN? As we get closer to doing a
>> large scale test, I would like to have all the HDFS and YARN code in
>> master
>> before starting the test. Soon after the test, we can do a big release.
>>
>> Thanks for the status update.
>> Preston
>>
>>
>

Re: HDFS Status

Posted by Efi <ef...@gmail.com>.
Hello Preston,

     Sorry for the delay, about the github pull requests, there is one 
issue left to be resolved, the custom system path that needs to be set 
in the cluster configuration file during the maven build time.Working on 
it right now and we scheduled a meeting with Steven for next week to 
merge the pull request.

     About Yarn I still haven't started working on it but I will after I 
am done with the maven issue, although I have my university exams this 
month so I do not have as much time as before.After I am done with 
exams, if it is not resolved by then, I will have more time to work on it.

Best regards,
Efi

On 09/12/2015 12:49 AM, Eldon Carman wrote:
> I wanted to check in and see how the HDFS work is going. I see there are
> two pull requests up on github, but they are almost a month old now.
>
> First, have the reviewers comments been addressed? When do you think we can
> merge the HDFS read code (with documentation) into VXQuery?
>
> Second, how is the progress going with YARN? As we get closer to doing a
> large scale test, I would like to have all the HDFS and YARN code in master
> before starting the test. Soon after the test, we can do a big release.
>
> Thanks for the status update.
> Preston
>