You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by trinakrug <gi...@git.apache.org> on 2016/03/14 17:51:55 UTC

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

GitHub user trinakrug opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/385

    TRAFODION-1885 : online expansion

    Enhancement to allow online expansion for new nodes.  Details in JIRA TRAFODION-1885.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/trinakrug/incubator-trafodion tktraf_fixes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #385
    
----
commit c6bc8a3fc8ed592655598cc37a5c63c81e00cf79
Author: Trina Krug <tr...@edev05.esgyn.local>
Date:   2016-03-14T16:45:38Z

    JIRA 1885 : online expansion

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56405113
  
    --- Diff: core/sqf/monitor/linux/reqnodename.cxx ---
    @@ -0,0 +1,114 @@
    +///////////////////////////////////////////////////////////////////////////////
    +//
    +// @@@ START COPYRIGHT @@@
    +//
    +// Licensed to the Apache Software Foundation (ASF) under one
    +// or more contributor license agreements.  See the NOTICE file
    +// distributed with this work for additional information
    +// regarding copyright ownership.  The ASF licenses this file
    +// to you under the Apache License, Version 2.0 (the
    +// "License"); you may not use this file except in compliance
    +// with the License.  You may obtain a copy of the License at
    +//
    +//   http://www.apache.org/licenses/LICENSE-2.0
    +//
    +// Unless required by applicable law or agreed to in writing,
    +// software distributed under the License is distributed on an
    +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +// KIND, either express or implied.  See the License for the
    +// specific language governing permissions and limitations
    +// under the License.
    +//
    +// @@@ END COPYRIGHT @@@
    +//
    +///////////////////////////////////////////////////////////////////////////////
    +
    +#include <stdio.h>
    +#include "reqqueue.h"
    +#include "montrace.h"
    +#include "monsonar.h"
    +#include "monlogging.h"
    +#include "replicate.h"
    +#include "mlio.h"
    +
    +extern CMonitor *Monitor;
    +extern CMonStats *MonStats;
    +extern CNodeContainer *Nodes;
    +extern CReplicate Replicator;
    +extern CNode *MyNode;
    +
    +
    +CExtNodeNameReq::CExtNodeNameReq (reqQueueMsg_t msgType, int pid,
    +                          struct message_def *msg )
    +    : CExternalReq(msgType, pid, msg)
    +{
    +    // Add eyecatcher sequence as a debugging aid
    +    memcpy(&eyecatcher_, "RQEZ", 4);
    +}
    +
    +CExtNodeNameReq::~CExtNodeNameReq()
    +{
    +    // Alter eyecatcher sequence as a debugging aid to identify deleted object
    +    memcpy(&eyecatcher_, "rqez", 4);
    --- End diff --
    
    Nice technique


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56405697
  
    --- Diff: core/sqf/monitor/linux/shell.cxx ---
    @@ -3069,6 +3072,103 @@ const char *zone_type_string( ZoneType type )
         return( str );
     }
     
    +void changeNodeName (char *current_name, char *new_name)
    +{
    +  
    +    if ((current_name == NULL) || (new_name == NULL))
    +    {
    +         printf( "[%s] Error: Invalid node name while attempting to change node name.\n", MyName );           
    +         return;
    +    }
    +    int count;
    +    MPI_Status status;
    +    CPhysicalNode  *physicalNode;
    +    PhysicalNodeNameMap_t::iterator it;
    +    pair<PhysicalNodeNameMap_t::iterator, bool> pnmit;
    +
    +    // Look up name
    +    it = PhysicalNodeMap.find( current_name );
    +
    +    if (it != PhysicalNodeMap.end())
    +    {
    +        physicalNode = it->second;
    +        if (physicalNode)
    +        {
    +           CPhysicalNode  *newPhysicalNode = new CPhysicalNode( new_name, physicalNode->GetState() );
    +           if (newPhysicalNode == NULL)
    +          {
    +               printf( "[%s] Error: Internal error with configuration while changing node name.\n", MyName );           
    +              return;
    +           }
    +           //remove and read
    +           PhysicalNodeMap.erase(current_name);
    +           pnmit = PhysicalNodeMap.insert( PhysicalNodeNameMap_t::value_type 
    +                                            ( newPhysicalNode->GetName(), newPhysicalNode ));
    +           if (pnmit.second == false)
    +           {   // Already had an entry with the given key value.  
    +                printf( "[%s] Error: Internal error while changing node name. Node name exists, node name=%s\n", MyName, new_name );
    +                return;
    --- End diff --
    
    Should we do a delete on newPhysicalNode in this code path? (Will it leak?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by trinakrug <gi...@git.apache.org>.
Github user trinakrug commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56402324
  
    --- Diff: core/sqf/monitor/linux/pnode.cxx ---
    @@ -1528,6 +1528,14 @@ void CNodeContainer::AddNodes( )
         int pnid;
         int rank;
         int *sparePNids = NULL;
    +    
    +    //  only relevant on a workstation acting as a single node
    +    const char* envVar = getenv("SQ_MAX_RANK"); 
    +    int maxNode;
    +    if (envVar != NULL)
    +      maxNode = atoi (envVar);
    --- End diff --
    
    This is a testing tool only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by trinakrug <gi...@git.apache.org>.
Github user trinakrug commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56402572
  
    --- Diff: core/sqf/monitor/linux/replicate.cxx ---
    @@ -68,7 +68,7 @@ void CReplObj::validateObj()
     // Determine the maximum size of a replication object (excluding CReplEvent)
     int CReplObj::calcAllocSize()
     {
    -    return  max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(sizeof(CReplSoftNodeUp),
    +    return  max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(sizeof(CReplSoftNodeUp),
    --- End diff --
    
    Yes, very fun :-)  I kept the code model as is with my change since I wasn't sure if there was a reason to do it this way.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by trinakrug <gi...@git.apache.org>.
Github user trinakrug commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56402632
  
    --- Diff: core/sqf/monitor/linux/cmsh.cxx ---
    @@ -127,11 +127,14 @@ int CCmsh::GetClusterState( PhysicalNodeNameMap_t &physicalNodeMap )
                 it = physicalNodeMap.find( nodeName.c_str() );
                 if (it != physicalNodeMap.end())
                 {
    -                // TEST_POINT: to force state down on node name 
    +               // TEST_POINT and Exclude List : to force state down on node name 
                     const char *downNodeName = getenv( TP001_NODE_DOWN );
    -                if ( downNodeName != NULL && 
    -                    !strcmp( downNodeName, nodeName.c_str() ) )
    -                {
    +                const char *downNodeList = getenv( TRAF_EXCLUDE_LIST );
    +                if (( downNodeList != NULL && 
    +                    (strstr(downNodeList,nodeName.c_str()))) ||
    --- End diff --
    
    It is not a debugging tool and we do not want false positives.  I will fix this.  Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-trafodion/pull/385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56398349
  
    --- Diff: core/sqf/monitor/linux/cmsh.cxx ---
    @@ -127,11 +127,14 @@ int CCmsh::GetClusterState( PhysicalNodeNameMap_t &physicalNodeMap )
                 it = physicalNodeMap.find( nodeName.c_str() );
                 if (it != physicalNodeMap.end())
                 {
    -                // TEST_POINT: to force state down on node name 
    +               // TEST_POINT and Exclude List : to force state down on node name 
                     const char *downNodeName = getenv( TP001_NODE_DOWN );
    -                if ( downNodeName != NULL && 
    -                    !strcmp( downNodeName, nodeName.c_str() ) )
    -                {
    +                const char *downNodeList = getenv( TRAF_EXCLUDE_LIST );
    +                if (( downNodeList != NULL && 
    +                    (strstr(downNodeList,nodeName.c_str()))) ||
    --- End diff --
    
    I suppose TRAF_EXCLUDE_LIST is a debugging tool. It occurs to me though that strstr() could return false positives, for example if there were two nodes  n001.x.y and another nn001.x.y. If nn001.x.y were in the list, strstr of n001.x.y would return true. But maybe we don't care about this for debugging purposes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56399136
  
    --- Diff: core/sqf/monitor/linux/replicate.cxx ---
    @@ -68,7 +68,7 @@ void CReplObj::validateObj()
     // Determine the maximum size of a replication object (excluding CReplEvent)
     int CReplObj::calcAllocSize()
     {
    -    return  max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(sizeof(CReplSoftNodeUp),
    +    return  max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(max(sizeof(CReplSoftNodeUp),
    --- End diff --
    
    The max'es are fun. Another way to do this is to create a union of all these objects, and take the sizeof the union. The C compiler then takes the max.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-trafodion pull request: TRAFODION-1885 : online expansio...

Posted by DaveBirdsall <gi...@git.apache.org>.
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/385#discussion_r56398777
  
    --- Diff: core/sqf/monitor/linux/pnode.cxx ---
    @@ -1528,6 +1528,14 @@ void CNodeContainer::AddNodes( )
         int pnid;
         int rank;
         int *sparePNids = NULL;
    +    
    +    //  only relevant on a workstation acting as a single node
    +    const char* envVar = getenv("SQ_MAX_RANK"); 
    +    int maxNode;
    +    if (envVar != NULL)
    +      maxNode = atoi (envVar);
    --- End diff --
    
    Again, I suppose SQ_MAX_RANK is a debugging tool, so we don't bother checking for non-numeric inputs such as might happen with "export SQ_MAX_RANK=ON" (where atoi would return 0 I think).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---