Feedback nodes in Error cluster

Some time back I
had posted a VI called Simple StateMachine template. Based on the
comments from fellow members I have redone it and am attaching it
herewith. ( LV7.1 + Win_XP Professional )
If you just open it and check the block diagram, you will notice that
LV has introduced a feedback node in the error cluster looping between
DAQMx function Write Digital Output and Read Analog input. I am not
sure why LV does this even though the two functions have no common
reference or relevance between them.
You can also try this : Just remove the data lines to the two Digital
Out functions, and also the error cluster with Feedback node . Rewire
the Error cluster but this time without the feedback node. And now if
you try to connect the Digital Out data lines, you will find that the
Data lines are drawn with feed back nodes !!
I am sure LV is trying to tell me something - only that I don't understand what it is. Can some one elaborate on this ?
Thanks
Raghunathan
Message Edited by Raghunathan on 07-04-2005 08:29 PM
Raghunathan
LV2012 to Automate Hydraulic Test rigs.
Attachments:
PRP_Main.vi ‏288 KB

You have an impossible loop in your code (Run it with execution highlighting to verify):
Dig8-15 relies on data from the case structure.
analog input 0-7 provides data into the big case structure
This means that Dig8-15 cannot execute until the case has executed, but the case must wait for data from analog out. LabVIEW is smart enough to insert a feeback node such that AI0-7 gets the error from the previous interation. This is NOT a desirable situration.
Without the feedback node, yuor code is broken. AI0-7 cannnot run because it must wait for DO8-15 to execute. DO8-15 cannot execute because it must wait for AI0-7.
You should:
Uncheck the diagram option "Auto insert feedback nodes in cycles" to make sure to get broken wires so it is easier to find the problem.
FIx your dataflow. Please make sure your wires flow left-to-right, it make errors like this less likely. You need to wire the error clusters in the order the subVIS execute.
I hope this is clear enough, but please ask is you continue to have problems. Good luck!
Message Edited by altenbach on 07-04-2005 07:18 PM
LabVIEW Champion . Do more with less code and in less time .

Similar Messages

[SOLVED] Can't add a node to the cluster with error (Exchange 2010 SP3 DAG Windows Server 2012)

Hi there!
I have a problem which makes me very angry already :)
I have two servers Exchange 2010 SP3 with MB role started on Windows Server 2012. I decided to create a DAG.
I have created the prestaged AD object for the cluster called msc-co-exc-01c, assigned necessary permissions and disabled it. Allowed through the Windows Firewall traffic between nodes and prepared the File Share Witness server.
Then I have tried to add nodes. The first node has been added successfully, but the second node doesn't want to be added :). Now I can add only one node to the DAG. I tried to add different servers first, but only the first one was added.
LOGS on the second nodes:
Application Log
"Failed to initialize cluster with error 0x80004005." (MSExchangeIS)
Failover Clustering Diagnostic Log
"[VER] Could not read version data from database for node msc-co-exc-04v (id 1)."
CMDLET Error:
Summary: 1 item(s). 0 succeeded, 1 failed.
Elapsed time: 00:06:21
MSC-CO-EXC-02V
Failed
Error:
A database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode()
(MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed. [Server: msc-co-exc-04v.int.krls.ru]
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
This operation returned because the timeout period expired
Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.3.174.1&t=exchgf1&e=ms.exch.err.ExC9C315
Warning:
Network name 'msc-co-exc-01c' is not online. Please check that the IP address configuration for the database availability group is correct.
Warning:
The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2014-11-17_13-54-56.543_add-databaseavailabiltygroupserver.log".
Exchange Management Shell command attempted:
Add-DatabaseAvailabilityGroupServer -MailboxServer 'MSC-CO-EXC-02V' -Identity 'msc-co-exc-01c'
Elapsed Time: 00:06:21
UPD:
when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
on both nodes.
It's confused me, because there is normal connectivity between these DAG members through the main interface.
Please, help me if you can.

Hi, Jared! Thank you for the reply.
Of course I did it already :) I have new info:
when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
on both nodes.
It's confused me, because there is normal connectivity between these DAG members through the main interface.

Does the error of the error cluster appear at invoke node?

I use a "Open VI Reference" with "Invoke Node: Run VI and Wait until Done= True" if the VI run and have an error. Does the error return to the caller/server?

No, you will need to use a CallByReference node to run the VI and then you can see the error cluster (if it is wired in the called VI's connector pane). Another option is to use the "Get Control Value" VI Server Method on the VI, to return the "error out" cluster of the called VI after the Run method is completed.
-Jim

Feedback Node error and NXT

I'm trying to create a simple counter using a LabView 8.5 and the NXT. I'm using a Feedback Node and it does not show an error while editing the VI. When I try to download the VI to the NXT I get the following error.
ERROR: An error occurred during parsing. Node was unable to be parsed. ((Class: FeedbackNode) (VI:Teleop.vi))
List of errors:
ERROR: An error occurred during parsing. Node was unable to be parsed.
When I click on the error it highlights the feedback node (arrow pointing left).
Sean

I found the problem after reading this document:
ftp://ftp.ni.com/evaluation/mindstorms/LabVIEW_for_NXT_Advanced_Programming_Guide.pdf
Feedback nodes are not supported in a while loop. But, you can convert it to a shift register and it will function.
Sean

Should a dot net Invoke node fill the error cluster?

Situation, a dll called by LabVIEW attempts to communicate with an instrument and can not because the instrument is not present. The DLL returns a standard LabVIEW error cluster with no error.
Person A says this is wrong : it must return an error. Further modules in the chain can not succeed with no communication. And that it is possible to return an error by throwing an exception.
Person B says : The program ran and did not die. This is our module not a LabView function. The error cluster should return no error. And that it can not return an error either,
The questions are Who is right A or B? And , since is has already gotten to a ‘is to, is not’ kind of disagreement, Where is any supporting documentation to reference that will authoritatively resolve the issue?
Solved!
Go to Solution.

In order for the dll to produce an error in LabVIEW, it must be programmed to throw an exception if no instrument is available. LabVIEW will then convert this exception into a generic error with code 1172 (see attached article.) LabVIEW can be programmed to give a custom error message using the attached example. In this case, it should be possible to have the .NET dll throw an exception, and then have LabVIEW generate a custom error message to display to the user. In this instance, it appears that Person A is correct. Hope this helps resolve the issue.
LabVIEW and .NET Exceptions
http://digital.ni.com/public.nsf/allkb/B15CE9F2715434C386256D3500601878
Programmatically Generate Custom Error Codes/Messages
https://decibel.ni.com/content/docs/DOC-9557
Regards,
Chris L
Applications Engineer
National Instruments
Certified LabVIEW Associate Developer

Error cluster constant appears different in two locations on a block diagram

I am a newbie to LabVIEW. I have taken Core 1 & 2 and Machine Vision and I have not com across this before.
The image on the right is obviously an Error cluster constant used in the block diagram to create an Error cluster and wire it to an error out terminal. As far as I can tell the image on the left is about the same thing, but why does it look different? The different appearance causes raises a concern that there is a difference in behavior that I do not understand. LabVIEW help suggets that both are Error constants. When I create a new error constant, it always ends up appearing like the right image above. I have not been able to create someting looking like the image on the left.
Could someone please confirm what the image on the left is on a blck diagram?
Thanks,
Bill
Solved!
Go to Solution.

The image on the left is an error cluster control. It has a front panel presence and can be set either via the front panel or through a property node or local variable. The image on the right is an error cluster constant. It is a static value.
Mark Yedinak
"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

How to find out the IP@s of all nodes in a cluster?

Is there any way to retrieve the IP addresses of all nodes in a cluster?
The problem is the following. We intend to write an administration program
that administers all nodes of a cluster using rmi (e.g. tell all singletons
in the cluster to reload configuration values etc.). My understanding is
that rmi only talks to a single node in a cluster. It would be a convenient
feature if the administration program could figure out all nodes in a
cluster by itself and then administers each node sequentially. So far we're
planning to pass all IP addresses to the administration program e.g. as
command line arguments but what if a node gets left out due to human error?
Thanks for your help.
Bernie

There is no public interface to inquire about the IP addresses of the servers in a cluster. If you use WLS 6.0, there is an administrative console that uses JMX to manage the cluster. Perhaps that would be of use to you?
Bernhard Lenz wrote:
Is there any way to retrieve the IP addresses of all nodes in a cluster?
The problem is the following. We intend to write an administration program
that administers all nodes of a cluster using rmi (e.g. tell all singletons
in the cluster to reload configuration values etc.). My understanding is
that rmi only talks to a single node in a cluster. It would be a convenient
feature if the administration program could figure out all nodes in a
cluster by itself and then administers each node sequentially. So far we're
planning to pass all IP addresses to the administration program e.g. as
command line arguments but what if a node gets left out due to human error?
Thanks for your help.
Bernie

RAC Instalation Problem (shared accross all the nodes in the cluster)

All experts
I am trying for installing Oracle 10.2.0 RAC on Redhat 4.7
reff : http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnLinux
All steps successfully completed on all nodes (rac1,rac2) every thing is okey for each node
on single node rac instalation successfull.
when i try to install on two nodes
on specify Oracle Cluster Registry (OCR) location showing error
the location /nfsmounta/crs.configuration is not shared accross all the nodes in the cluster. Specify a shared raw partation or cluster file system file that is visible by the same name on all nodes of the cluster.
I create shared disks on all nodes as:
1 First we need to set up some NFS shares. Create shared disks on NAS or a third server if you have one available. Otherwise create the following directories on the RAC1 node.
mkdir /nfssharea
mkdir /nfsshareb
2. Add the following lines to the /etc/exports file. (edit /etc/exports)
/nfssharea *(rw,sync,no_wdelay,insecure_locks,no_root_squash)
/nfsshareb *(rw,sync,no_wdelay,insecure_locks,no_root_squash)
3. Run the following command to export the NFS shares.
chkconfig nfs on
service nfs restart
4. On both RAC1 and RAC2 create some mount points to mount the NFS shares to.
mkdir /nfsmounta
mkdir /nfsmountb
5. Add the following lines to the "/etc/fstab" file. The mount options are suggestions from Kevin Closson.
nas:/nfssharea /nfsmounta nfs rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 0 0
nas:/nfsshareb /nfsmountb nfs rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 0 0
6. Mount the NFS shares on both servers.
mount /mount1
mount /mount2
7. Create the shared CRS Configuration and Voting Disk files.
touch /nfsmounta/crs.configuration
touch /nfsmountb/voting.disk
Please guide me what is wrong

I think you did not really mount it on the second server. what is the output of 'ls /nfsmounta'.
step 6 should be 'mount /nfsmounta', not 'mount 1'. I also don't know if simply creating a zero-size file is sufficient for ocr (i have always used raw devices, not nfs for this)

Unable to see other OC4J nodes in a cluster

I have installed 2 instances of OracleAS on 2 separate machines, both machines ( Lnx-5 and Lnx-6 ) were installed with the J2EE component and WEB component.
During installation, I have selected Lnx-5 as the administration node of the cluster, and I have configured the discovery address using multicast address 225.0.0.33:8001.
There were no installations errors encountered and things seems to work fine.
However, on Lnx-5, it can't "see" Lnx-6 as one of its cluster nodes. On both Lnx-5 and Lnx-6, I see the following when I issued the "opmnctl @cluster status".
---- On Lnx-5 , here is what I got ---------
[root@Lnx-5 conf]# opmnctl @cluster status
Processes in Instance: Lnx5.anydomain.com
--------------------------------------------------------------+---------
ias-component | process-type | pid | status
--------------------------------------------------------------+---------
OC4JGroup:default_group | OC4J:home | 5392 | Alive
ASG | ASG | N/A | Down
HTTP_Server | HTTP_Server | 5391 | Alive
---- On Lnx-6 , here is what I got ---------
[root@Lnx-6 conf]# opmnctl @cluster status
Processes in Instance: Lnx6.anydomain.com
--------------------------------------------------------------+---------
ias-component | process-type | pid | status
--------------------------------------------------------------+---------
OC4JGroup:default_group | OC4J:home | 5392 | Alive
ASG | ASG | N/A | Down
HTTP_Server | HTTP_Server | 5391 | Alive
I suppose I should see both Lnx-5 and Lnx-6 when I issue the commad in either nodes.
I have also verified that both machine are synchronized to the NTP server.
I have also done a tcpdump on both nodes, indeed I can multicast ( 225.0.0.33:8001 ) packets arriving at both nodes..
Really need some help in what would have go wrong, what information should I look for to address this issue.
Thanks in advance!!

Ok, for the discovery server configuration, here is the config that I have in the opmn.xml file, both lnx-5 and lnx-6 use exactly the same configuration :
<notification-server interface="ipv4">
<port local="6101" remote="6201" request="6004"/>
<ssl enabled="true" wallet-file="$ORACLE_HOME/opmn/conf/ssl.wlt/default"/>
<topology>
<discover list="10.1.230.11:6201,10.1.230.12:6201"/>
</topology>
</notification-server>
the ip address of Lnx-5 is 10.1.230.11, and Lnx-6 is 10.1.230.12.
Once this was configured on both Lnx-5, Lnx-6, I keep seeing this error from the Lnx-6's log file :
07/05/16 22:10:18 [pm-process] Process Alive: default_group~home~default_group~1
(1542677438:3859)
07/05/16 22:10:18 [pm-requests] Request 2 Completed. Command: /start
07/05/16 22:13:25 [ons-connect] Connection 9,10.1.230.11,6201 connect (Connectio
n refused)
07/05/16 22:13:26 [ons-connect] Connection a,10.1.230.12,6201 connect (Connectio
n refused)
Well, Once I enabled the debugging, there were some errors reported when opmn is started, the errors are as follows :
Loading Module libopmnohs callback functions
Module libopmnohs: loaded callback function opmnModInitialize
Module libopmnohs: unable to load callback function opmnModSetNumProcs
Module libopmnohs: unable to load callback function opmnModParse
Module libopmnohs: unable to load callback function opmnModDebug
Module libopmnohs: unable to load callback function opmnModDepend
Module libopmnohs: loaded callback function opmnModStart
Module libopmnohs: unable to load callback function opmnModReady
Module libopmnohs: loaded callback function opmnModNotify
Module libopmnohs: loaded callback function opmnModRestart
Module libopmnohs: loaded callback function opmnModStop
Module libopmnohs: loaded callback function opmnModPing
Module libopmnohs: loaded callback function opmnModProcRestore
Module libopmnohs: loaded callback function opmnModProcComp
Module libopmnohs: unable to load callback function opmnModReqComp
Module libopmnohs: unable to load callback function opmnModCall
Module libopmnohs: unable to load callback function opmnModInfo
Module libopmnohs: unable to load callback function opmnModCron
Module libopmnohs: loaded callback function opmnModTerminate
Loading Module libopmnoc4j callback functions
Module libopmnoc4j: loaded callback function opmnModInitialize
Module libopmnoc4j: unable to load callback function opmnModSetNumProcs
Module libopmnoc4j: loaded callback function opmnModParse
Module libopmnoc4j: unable to load callback function opmnModDebug
Module libopmnoc4j: unable to load callback function opmnModDepend
Module libopmnoc4j: loaded callback function opmnModStart
Module libopmnoc4j: unable to load callback function opmnModReady
Module libopmnoc4j: loaded callback function opmnModNotify
Module libopmnoc4j: loaded callback function opmnModRestart
Module libopmnoc4j: loaded callback function opmnModStop
Module libopmnoc4j: loaded callback function opmnModPing
Module libopmnoc4j: loaded callback function opmnModProcRestore
Module libopmnoc4j: loaded callback function opmnModProcComp
Module libopmnoc4j: unable to load callback function opmnModReqComp
Module libopmnoc4j: unable to load callback function opmnModCall
Module libopmnoc4j: unable to load callback function opmnModInfo
Module libopmnoc4j: unable to load callback function opmnModCron
Module libopmnoc4j: loaded callback function opmnModTerminate
Loading Module libopmncustom callback functions
Module libopmncustom: loaded callback function opmnModInitialize
Module libopmncustom: unable to load callback function opmnModSetNumProcs
Module libopmncustom: loaded callback function opmnModParse
Module libopmncustom: loaded callback function opmnModDebug
Module libopmncustom: unable to load callback function opmnModDepend
Module libopmncustom: loaded callback function opmnModStart
Module libopmncustom: loaded callback function opmnModReady
Module libopmncustom: unable to load callback function opmnModNotify
Module libopmncustom: loaded callback function opmnModRestart
Module libopmncustom: loaded callback function opmnModStop
Module libopmncustom: loaded callback function opmnModPing
Module libopmncustom: loaded callback function opmnModProcRestore
Module libopmncustom: loaded callback function opmnModProcComp
Module libopmncustom: loaded callback function opmnModReqComp
Module libopmncustom: unable to load callback function opmnModCall
Module libopmncustom: unable to load callback function opmnModInfo
Module libopmncustom: unable to load callback function opmnModCron
Module libopmncustom: loaded callback function opmnModTerminate
Loading Module libopmniaspt callback functions
Module libopmniaspt: loaded callback function opmnModInitialize
Module libopmniaspt: unable to load callback function opmnModSetNumProcs
Module libopmniaspt: unable to load callback function opmnModParse
Module libopmniaspt: unable to load callback function opmnModDebug
Module libopmniaspt: unable to load callback function opmnModDepend
Module libopmniaspt: loaded callback function opmnModStart
Module libopmniaspt: loaded callback function opmnModReady
Module libopmniaspt: unable to load callback function opmnModNotify
Module libopmniaspt: unable to load callback function opmnModRestart
Module libopmniaspt: loaded callback function opmnModStop
Module libopmniaspt: unable to load callback function opmnModPing
Module libopmniaspt: unable to load callback function opmnModProcRestore
Module libopmniaspt: loaded callback function opmnModProcComp
Module libopmniaspt: unable to load callback function opmnModReqComp
Module libopmniaspt: unable to load callback function opmnModCall
Module libopmniaspt: unable to load callback function opmnModInfo
Module libopmniaspt: unable to load callback function opmnModCron
Module libopmniaspt: loaded callback function opmnModTerminate
Looks pretty bad.. What cuases those errors to happen? Are they related?
Thanks!!

Pass an error cluster in and out of a C/C++ dll?

Hi all,
I'd like to know if it is possible to pass a LabVIEW error cluster to a C/C++ function from a dll. This would greatly help error handling in the different VIs.
I am able to access and modify the first two members of the error cluster; the error status and the error code, which are, respectively, boolean and integer. But I cannot modify the string. LabVIEW crashes completely doing so.
I first define a structure in C++ like this:
const int N = 512;
#pragma pack(push,1)
typedef struct lvcluster {
bool status;
int code;
char source[N];
} lvcluster;
#pragma pack(pop)
Then, I define a function that will access the members status, code and source:
int TestCluster(lvcluster *err)
err->code = 1;
err->status = false;
sprintf(err->source, 'Test');
I then use LabVIEW's "Call Library Function" to call this dll's function. I have set the parameter "err" to "Adapt to Type" and "Handles by Value". Trying to write characters to the source array crashes LabVIEW.
Is this possible at all? How should it be done?
Thanks!

Thanks all for the comments.
I've been looking at extcode.h where I saw the defeninition of a LStrHandle. It seems to be a pointer to pointer to "character array":
typedef struct {
int cnt; /* number of bytes that follow */
unsigned char str[1]; /* cnt bytes */
} LStr, *LStrPtr, **LStrHandle;
The "character array" is different than a C character array, see http://www.ni.com/white-paper/4877/en/#toc4
The first 4 bytes contain a signed 32 bit integer representing the number of characters. There is no NULL-termination character.
So the error structure should be something like this (modulo the size of boolean, thanks rolfk):
const int N = 512;
#pragma pack(push,1)
typedef struct lvcluster {
bool status;
int32 code;
LStrHandle source;
} lvcluster;
#pragma pack(pop)
From there, I was able to access a LabVIEW string from C. But I am unable to modify any of it. I might be able to change the characters from an alreay allocated string, but resizing or even creating a new string crashes LabVIEW. As reported by others, manipulating these strings would require linking against labview's library to access the string manipulation functions, but this is not possible as the library must be independant of LabVIEW.
The only last possible way I can think of is to allocate a new cluster inside the DLL. Then I might be able to change the string in it, and hopefully LabVIEW would pick it up. I don't know how LabVIEW manages its memory; would it garbage collect the input cluster that is not used anymore?
Thanks for all the feedback.

Feedback node crashing bug

I have came across a nasty bug that caused Labview 2010 SP1 (Runnnig Win 7 Ultimate x64 bit) to crash without any warning.
To replicate the bug do the following:
Add a numeric control and another indicator to the front panel
Switch to block diagram and add a feed back node
Connect the initializer terminal of the feed back node to the output of the control
Now do ANY of the following to cause the bug:
Press the run buttong (which is broken due to not connecting the input of the feed back node) it will turn to a normal run without displaying the error
Do an extra action and undo it, the run button will turn from list error normal
So far the Vi can be saved normally. Now connect the output of the feed back node to the indicator and try any of the followings:
Save the VI
Close the VI
Create a new project and select to add the VI to the project
This will cause Labview to crash without any notice!
When you are at step 4, the bug is there but harmless. Once you combine it with step 5 (connect to indicator), the bug is active and cause crashing. I have attached a snapshot of how the Front panel/block diagram look like before saving (since it can't be saved). Notice how the run button is enabled although the input of the feedback node is not connected.
I have tried to replicate the error on Labview 2009 but couldn't.
Attachments:
FBN Bug.jpg ‏56 KB

Dear ªL¡
Thank you for briging our attention to this issue.
I replicated it on LabVIEW 2010 SP1 and confirmed, that in LabVIEW 2011 it has been fixed.
Thanks again!
Best regards,
Mateusz Stokłosa
Applications Engineer
National Instruments

Question about adding an Extra Node to SOFS cluster

Hi, I have a fully functioning SOFS cluster, with two nodes, it uses SAN FC storage, Not SAS JBODS. its running about 100VM's in production at the moment.
Both my nodes currently sit on one blade chassis, but for resiliency, I want to add another node from a blade chassis in our secondary onsite smaller DC.
I've done plenty of cluster node upgrades before on SQL and Hyper-V , but never with a SOFS cluster.
I have the third node fully prepaired, it can see the Disks the FC Luns, on the SAN (using powerpath, disk manager) and all the roles are installed.
so in theory I can just add this node in the cluster manager and it should all be good, my question is has anyone else done this, and is there anything else I should be aware of, and what's the best way to check the new node will function , and be able
to migrate the File role over without issues. I know I can run a validation when adding the node, I presume this is the best option ?
cannot find much information on the web about expanding a SOFS cluster.
any advice or information would be greatfully received !!
cheers
Mark

Hi Mark,
Sorry for the delay in reply.
As you said there is no much information which related to add a node to a SOFS cluster.
The only ones I could find is related to System Center (VMM):
How to Add a Node to a Scale-Out File Server in VMM
http://technet.microsoft.com/en-us/library/dn466530.aspx
However adding a node to SOFS cluster should be simple as you just prepared. You can have a try and see the result.
If you have any feedback on our support, please send to [email protected]

Adding node back into cluster after removal...

Hi,
I removed a cluster node using "scconf -r -h <node>" (carried out all the other usual removal steps before getting this command to work).
Because this is a pair+1 cluster and the node i was trying to remove was physically attached to the quroum device (scsi), I had to create a dummy node before the removal command above would work.
I reinstalled solaris, SC3.1u4 framwork, patches etc. and then tried to run scsinstall again on the node (reintroduced the node to the cluster again first using scconf -a -T node=<node>).
However! during the scsinstall i got the following problem:
Updating file ("ntp.conf.cluster") on node n20-2-sup ... done
Updating file ("hosts") on node n20-2-sup ... done
Updating file ("ntp.conf.cluster") on node n20-3-sup ... done
Updating file ("hosts") on node n20-3-sup ... done
scrconf: RPC: Unknown host
scinstall: Failed communications with "bogusnode"
scinstall: scinstall did NOT complete successfully!
Press Enter to continue:
Was not sure what to do at this point, but since the other clusternodes could now see my 'new' node again, i removed the dummy node, rebooted the new node and said a little prayer...
Now, my node will not boot as part of the cluster:
Rebooting with command: boot
Boot device: /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cfa3e691,0:a File and args:
SunOS Release 5.10 Version Generic_127111-06 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: n20-1-sup
/usr/cluster/bin/scdidadm: Could not load DID instance list.
Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node n20-1-sup (nodeid = 1) with votecount = 0 added.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) with votecount = 2 added.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) with votecount = 1 added.
NOTICE: CMM: Node bogusnode (nodeid = 4) with votecount = 0 added.
NOTICE: clcomm: Adapter qfe5 constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being constructed
NOTICE: clcomm: Adapter qfe1 constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being constructed
NOTICE: CMM: Node n20-1-sup: attempting to join cluster.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being initiated
NOTICE: CMM: Node n20-2-sup (nodeid: 2, incarnation #: 1205318308) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being initiated
NOTICE: CMM: Node n20-3-sup (nodeid: 3, incarnation #: 1205265086) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node n20-1-sup (nodeid = 1) is up; new incarnation number = 1205346037.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) is up; new incarnation number = 1205318308.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) is up; new incarnation number = 1205265086.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #18 completed.
NOTICE: CMM: Node n20-1-sup: joined cluster.
NOTICE: CMM: Node (nodeid = 4) with votecount = 0 removed.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #19 completed.
WARNING: clcomm: per node IP config clprivnet0:-1 (349): 172.16.193.1 failed with 19
WARNING: clcomm: per node IP config clprivnet0:-1 (349): 172.16.193.1 failed with 19
cladm: CLCLUSTER_ENABLE: No such device
UNRECOVERABLE ERROR: Sun Cluster boot: Could not initialize cluster framework
Please reboot in non cluster mode(boot -x) and Repair
syncing file systems... done
WARNING: CMM: Node being shut down.
Program terminated
{1} ok
Any ideas how i can recover this situation without having to reinstall the node again?
(have a flash with OS, sc3.1u4 framework etc... so not the end of the world but...)
Thanks a mil if you can help here!
- headwrecked

Hi - got sorted with this problem...
basically just removed (scinstall -r) the sc3.1u4 software from the node which was not booting, and then re-installed the software (this time the dummy node had been removed so it did not try to contact this node and the scinstall completed without any errors)
I think the only problem with the procedure i used to remove and readd the node was that i forgot to remove the dummy node before re-adding the actaul cluster node again...
If anyone can confirm this to be the case then great - if not... well its working now so this thread can be closed.
root@n20-1-sup # /usr/cluster/bin/scinstall -r
Verifying that no unexpected global mounts remain in /etc/vfstab ... done
Verifying that no device services still reference this node ... done
Archiving the following to /var/cluster/uninstall/uninstall.1036/archive:
/etc/cluster ...
/etc/path_to_inst ...
/etc/vfstab ...
/etc/nsswitch.conf ...
Updating vfstab ... done
The /etc/vfstab file was updated successfully.
The original entry for /global/.devices/node@1 has been commented out.
And, a new entry has been added for /globaldevices.
Mounting /dev/dsk/c3t0d0s6 on /globaldevices ... done
Attempting to contact the cluster ...
Trying "n20-2-sup" ... okay
Trying "n20-3-sup" ... okay
Attempting to unconfigure n20-1-sup from the cluster ... failed
Please consider the following warnings:
scrconf: Failed to remove node (n20-1-sup).
scrconf: All two-node clusters must have at least one shared quorum device.
Additional housekeeping may be required to unconfigure
n20-1-sup from the active cluster.
Removing the "cluster" switch from "hosts" in /etc/nsswitch.conf ... done
Removing the "cluster" switch from "netmasks" in /etc/nsswitch.conf ... done
** Removing Sun Cluster framework packages **
Removing SUNWkscspmu.done
Removing SUNWkscspm..done
Removing SUNWksc.....done
Removing SUNWjscspmu.done
Removing SUNWjscspm..done
Removing SUNWjscman..done
Removing SUNWjsc.....done
Removing SUNWhscspmu.done
Removing SUNWhscspm..done
Removing SUNWhsc.....done
Removing SUNWfscspmu.done
Removing SUNWfscspm..done
Removing SUNWfsc.....done
Removing SUNWescspmu.done
Removing SUNWescspm..done
Removing SUNWesc.....done
Removing SUNWdscspmu.done
Removing SUNWdscspm..done
Removing SUNWdsc.....done
Removing SUNWcscspmu.done
Removing SUNWcscspm..done
Removing SUNWcsc.....done
Removing SUNWscrsm...done
Removing SUNWscspmr..done
Removing SUNWscspmu..done
Removing SUNWscspm...done
Removing SUNWscva....done
Removing SUNWscmasau.done
Removing SUNWscmasar.done
Removing SUNWmdmu....done
Removing SUNWmdmr....done
Removing SUNWscvm....done
Removing SUNWscsam...done
Removing SUNWscsal...done
Removing SUNWscman...done
Removing SUNWscgds...done
Removing SUNWscdev...done
Removing SUNWscnmu...done
Removing SUNWscnmr...done
Removing SUNWscscku..done
Removing SUNWscsckr..done
Removing SUNWscu.....done
Removing SUNWscr.....done
Removing the following:
/etc/cluster ...
/dev/did ...
/devices/pseudo/did@0:* ...
The /etc/inet/ntp.conf file has not been updated.
You may want to remove it or update it after uninstall has completed.
The /var/cluster directory has not been removed.
Among other things, this directory contains
uninstall logs and the uninstall archive.
You may remove this directory once you are satisfied
that the logs and archive are no longer needed.
Log file - /var/cluster/uninstall/uninstall.1036/log
root@n20-1-sup #
Ran the scinstall again:
>>> Confirmation <<<
Your responses indicate the following options to scinstall:
scinstall -ik \
-C N20_Cluster \
-N n20-2-sup \
-M patchdir=/var/cluster/patches \
-A trtype=dlpi,name=qfe1 -A trtype=dlpi,name=qfe5 \
-m endpoint=:qfe1,endpoint=switch1 \
-m endpoint=:qfe5,endpoint=switch2
Are these the options you want to use (yes/no) [yes]?
Do you want to continue with the install (yes/no) [yes]?
Checking device to use for global devices file system ... done
Installing patches ... failed
scinstall: Problems detected during extraction or installation of patches.
Adding node "n20-1-sup" to the cluster configuration ... skipped
Skipped node "n20-1-sup" - already configured
Adding adapter "qfe1" to the cluster configuration ... skipped
Skipped adapter "qfe1" - already configured
Adding adapter "qfe5" to the cluster configuration ... skipped
Skipped adapter "qfe5" - already configured
Adding cable to the cluster configuration ... skipped
Skipped cable - already configured
Adding cable to the cluster configuration ... skipped
Skipped cable - already configured
Copying the config from "n20-2-sup" ... done
Copying the postconfig file from "n20-2-sup" if it exists ... done
Copying the Common Agent Container keys from "n20-2-sup" ... done
Setting the node ID for "n20-1-sup" ... done (id=1)
Verifying the major number for the "did" driver with "n20-2-sup" ... done
Checking for global devices global file system ... done
Updating vfstab ... done
Verifying that NTP is configured ... done
Initializing NTP configuration ... done
Updating nsswitch.conf ...
done
Adding clusternode entries to /etc/inet/hosts ... done
Configuring IP Multipathing groups in "/etc/hostname.<adapter>" files
IP Multipathing already configured in "/etc/hostname.qfe2".
Verifying that power management is NOT configured ... done
Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done
Ensure network routing is disabled ... done
Updating file ("ntp.conf.cluster") on node n20-2-sup ... done
Updating file ("hosts") on node n20-2-sup ... done
Updating file ("ntp.conf.cluster") on node n20-3-sup ... done
Updating file ("hosts") on node n20-3-sup ... done
Log file - /var/cluster/logs/install/scinstall.log.938
Rebooting ...
Mar 13 13:59:13 n20-1-sup reboot: rebooted by root
Terminated
root@n20-1-sup # syncing file systems... done
rebooting...
R
LOM event: +103d+20h44m26s host reset
screen not found.
keyboard not found.
Keyboard not present. Using lom-console for input and output.
Sun Netra T4 (2 X UltraSPARC-III+) , No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.1, 4096 MB memory installed, Serial #52960491.
Ethernet address 0:3:ba:28:1c:eb, Host ID: 83281ceb.
Initializing 15MB Rebooting with command: boot
Boot device: /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cfa3e691,0:a File and args:
SunOS Release 5.10 Version Generic_127111-06 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: n20-1-sup
Configuring devices.
devfsadm: minor_init failed for module /usr/lib/devfsadm/linkmod/SUNW_scmd_link.so
Loading smf(5) service descriptions: 24/24
/usr/cluster/bin/scdidadm: Could not load DID instance list.
Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node n20-1-sup (nodeid = 1) with votecount = 0 added.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) with votecount = 2 added.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) with votecount = 1 added.
NOTICE: clcomm: Adapter qfe5 constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being constructed
NOTICE: clcomm: Adapter qfe1 constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being constructed
NOTICE: CMM: Node n20-1-sup: attempting to join cluster.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being initiated
NOTICE: CMM: Node n20-2-sup (nodeid: 2, incarnation #: 1205318308) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being initiated
NOTICE: CMM: Node n20-3-sup (nodeid: 3, incarnation #: 1205265086) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node n20-1-sup (nodeid = 1) is up; new incarnation number = 1205416931.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) is up; new incarnation number = 1205318308.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) is up; new incarnation number = 1205265086.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #23 completed.
NOTICE: CMM: Node n20-1-sup: joined cluster.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
NOTICE: CMM: Votecount changed from 0 to 1 for node n20-1-sup.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #24 completed.
Mar 13 14:02:23 in.ndpd[351]: solicit_event: giving up on qfe1
Mar 13 14:02:23 in.ndpd[351]: solicit_event: giving up on qfe5
did subpath /dev/rdsk/c1t3d0s2 created for instance 2.
did subpath /dev/rdsk/c2t3d0s2 created for instance 12.
did subpath /dev/rdsk/c1t3d1s2 created for instance 3.
did subpath /dev/rdsk/c1t3d2s2 created for instance 6.
did subpath /dev/rdsk/c1t3d3s2 created for instance 7.
did subpath /dev/rdsk/c1t3d4s2 created for instance 8.
did subpath /dev/rdsk/c1t3d5s2 created for instance 9.
did subpath /dev/rdsk/c1t3d6s2 created for instance 10.
did subpath /dev/rdsk/c1t3d7s2 created for instance 11.
did subpath /dev/rdsk/c2t3d1s2 created for instance 13.
did subpath /dev/rdsk/c2t3d2s2 created for instance 14.
did subpath /dev/rdsk/c2t3d3s2 created for instance 15.
did subpath /dev/rdsk/c2t3d4s2 created for instance 16.
did subpath /dev/rdsk/c2t3d5s2 created for instance 17.
did subpath /dev/rdsk/c2t3d6s2 created for instance 18.
did subpath /dev/rdsk/c2t3d7s2 created for instance 19.
did instance 20 created.
did subpath n20-1-sup:/dev/rdsk/c0t6d0 created for instance 20.
did instance 21 created.
did subpath n20-1-sup:/dev/rdsk/c3t0d0 created for instance 21.
did instance 22 created.
did subpath n20-1-sup:/dev/rdsk/c3t1d0 created for instance 22.
Configuring DID devices
t_optmgmt: System error: Cannot assign requested address
obtaining access to all attached disks
n20-1-sup console login:

Load-balancing between Analytical Provider service nodes in a cluster

Hi All,
- First a little background on my architecture. My EPM environment consist of 3 Solaris servers:
Server1: Foundation Services + APS + EAS + WLS Admin server
Server2: Foundation Services + APS + EAS
Server3: Essbase server + Essbase Studio server
All above services are deployed to a single domain. We have a load-balancer sitting in front of server1 and server2 that redirects request based on availability of the services.
- Consider APS:
We have a APS cluster "AnalyticProviderServices" with members AnalyticProviderServices1 deployed on Server1 and AnalyticProviderServices2 deployed on Server2.
So I connect to APS and login as user1. Say the load-balancer decides to forward my request to server1, so all my request are then managed by APS on Server1. Now if APS on server1 is brought down, then any request to APS on server1 are redirected by weblogic to APS on server2.
Now ideally APS on server2 should say "hey I see APS on server1 is down so I will take up your session where it left off". So I expect the 2nd APS node in the cluster to tale up my session. But this does not happen.. I need to login again when I hit refresh in excel as I get the error "Invalid session.. Please login again". When I open EAS I see I have been logged in with a new session ID. So it seems that the cluster nodes simply act as load-balancers and are not smart enough to take up a failed nodes sessions where it left off.
Is my understanding correct or have I to configure something to allow for this to happen?
Thanks,
Kent

Thanks for your reply John!
I was hoping APS could do something like that .. I am not sure if restoring sessions of a dead APS cluster node on another APS would be helpful but I can think of one situation where a drill-through report is running for a long time on the Essbase server and APS goes down.. it would be good to have the other APS to take up the session and return the drill-through output to the user.

Cant't start node on the cluster

when I start my node in the cluster I have this error, what I should do to solve this?
<Oct 30, 2011 11:06:00 PM BRST> <Error> <ALSB Statistics Manager> <BEA-473003> <Aggregation Server Not Available. Failed to get remote aggregator
java.rmi.UnknownHostException: Could not discover URL for server 'Node1'
at weblogic.protocol.URLManager.findURL(URLManager.java:145)
at com.bea.alsb.platform.weblogic.topology.WlsRemoteServerImpl.getInitialContext(WlsRemoteServerImpl.java:94)
at com.bea.alsb.platform.weblogic.topology.WlsRemoteServerImpl.lookupJNDI(WlsRemoteServerImpl.java:54)
at com.bea.wli.monitoring.statistics.ALSBStatisticsManager.getRemoteAggregator(ALSBStatisticsManager.java:291)
at com.bea.wli.monitoring.statistics.ALSBStatisticsManager.access$000(ALSBStatisticsManager.java:38)
at com.bea.wli.monitoring.statistics.ALSBStatisticsManager$RemoteAggregatorProxy.send(ALSBStatisticsManager.java:55)
at com.bea.wli.monitoring.statistics.collection.Collector.sendRawSnaphotToAggregator(Collector.java:284)
at com.bea.wli.monitoring.statistics.collection.Collector.doCheckpoint(Collector.java:245)
at com.bea.wli.monitoring.statistics.collection.Collector$CheckpointThread.doWork(Collector.java:69)
at com.bea.wli.monitoring.utils.Schedulable.timerExpired(Schedulable.java:68)
at com.bea.wli.timer.ClusterTimerImpl$InternalTimerListener.timerExpired(ClusterTimerImpl.java:255)
at weblogic.timers.internal.TimerImpl.run(TimerImpl.java:273)
at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:528)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)
>

I solved this on two steps:
1 - I run ifconfig and check if my network card has MULTICAST flag
2 - I add this route run this command:
route add -host 239.192.0.10 dev eth0
afeter this my nodes has started with sucess

Feedback nodes in Error cluster

Similar Messages

Maybe you are looking for