AlwaysOn AG groups issue, both node are primary now

Hi guys,
We have a cluster which has two AG groups in two nodes. Now they are all promary now.
what can I do now?
Many thanks,
SkyRiver
SkyRiver

Actually, on node1, it says node2 is down. On node2, it says nide1 is down.
Windows team needs to fix the cluster issue.
SkyRiver
I dont know how much it helps in your issue, we also had an similar issue,but there was error logged,iam sure you need to look first the eventlogs errors ,cluster logs and sql errorlogs.
so in My case-
I was getting below error( here both the side SQL is able to connect & databases are well synchornized status,also if I login one node other shows off & other shows ON).
Cluster IP address rerource "p-ag2"cannot brought online because a duplicate adress"XX.XXX.XX.XXX" was dected on the network.please ensure all the ip adrresses are unique
Evvent ID ->1049.
along with 1205,1069
Temporary solution ->we have just shutdown secondary & start again -this was fixing but again" PLEASE DONOT PERFORM UNTIL YOU ARE SURE WITH SAME SPECIFIC ERROR,ALSO SUGGESTING TO OPEN A CASE & SHARE WITH ALL AFTER SOLUTION FIXED"
so we have worked with VM first-
since the our sql was on VM,There were VM rules was incorrect they fixed it & reproted VMs are on single host as per the cluster requirement but again after some the same issue caused.
so this time we have checked with WINDOWS TEAM,they also unable to point out but was intersting that quorum was not configured properly for VM -so
VMware team changes the vm disk configuration of secondary to point the quorum disk to the one used bt primary, but before this -
1. VMWare team need to configure the vm Secondary in the cluster : VM restart priority as disabled in HA
2. Shutdown secondary
3. Reboot Primary and check if cluster node and resources are ok, if not fix it first (This will cause a service interruption )
4. VMware team changes the vm disk configuration of Secondary to point the quorum disk to the one used bt Primary.
5. Start secondary
6 Check the MSCS cluster (no errors)
7. Check SQL and SQL failover of the ag groups.
8. If all validations are OK: VMWare team need to configure the vm on secondary in the cluster : VM restart priority as "cluster setting" in HA.
so above is fixed,but "PLEASE DONOT TRY ANYONE" -JUST SHARE THE FIRST ERRORLOGS,CLUSTERLOGS,EVENTLOGS information more, if you are not much pretty sure, please open a case with Vendor MS to see. Hope if someone writes this issue in the blog -helps
much better but errors are very improtant from the logs.
Also ensure from windows cluster services is in healthy status & quorum sharing b/w nodes are correct in place.
Thanks, Rama Udaya.K (http://rama38udaya.wordpress.com) ---------------------------------------- Please remember to mark the replies as answers if they help and UN-mark them if they provide no help,Vote if they gives you information.

Similar Messages

2012 SQL AlwaysOn Availability Group - JDBC re-connection not happening in case of Multisubnet failover

Hi There,
Having some problem with MultiSubNet AG failover re-connection using SQL JDBC driver 4. Details given below.
SETUP:
We have a three node 2012 Windows Server Cluster, on top it we have 2012 SQL Sever AlwaysOn Availability Group.
Two nodes are in one subnet and other in a different subnet to make this as a multisubnet AlwaysOn AG.
An AG Listener is configured with a DNS name and two VIP assigned (For two subnet) to it. We were able to failover from one subnet to other using the SQL Studio successfully.
NEED:
We have a java application which will be connected to AG Listener name. In case of failover, automatic re-connection to DB/AG Listener name should happen.
For this we use "sqljdbc4.jar" and added
'multiSubnetFailover=true' in our connection string. The connection String is given below.
cURL="jdbc:sqlserver://testsqlag:1433;databaseName=SalesDB;multiSubnetFailover=true;loginTimeout=200;applicationName=MyApp";
THE PROBLEM:
In case of AG failover, driver is not trying to re-connect to the AG Listener name. Not sure how to make it work.
Is this supported in SQL Server 2012 ?
Thanks,
Krishna.

Hi Sean,
Thanks for your answer.
I looked at the link earlier, but it was not very clear that, whether the mssql jdbc driver will automatically re-connect to AG Listener name in case of failure of Primary replica which will result in Multisubnet failover. Pasted the confusing statements
from the link
http://msdn.microsoft.com/en-us/library/gg558121(v=sql.110).aspx.
Also, because a connection can fail because of an availability group failover, you should implement connection retry logic, retrying a failed connection until it reconnects.
Connecting With MultiSubnetFailover :
During a multi-subnet failover, the client will attempt connections in parallel. During a subnet failover, the Microsoft JDBC Driver for SQL Server will aggressively retry the TCP connection.
Thanks,
Krishna.

File Share Witness Resouces Errors in a SQL 2012 Alwayson Availability Group Environment

Hi I am getting the following error in WFC Manager and in my system event log:
Event ID1564:
File share witness resource 'File Share Witness' failed to arbitrate for the file share '\\SQL2012ClusterWitnessPath'. Please ensure that file share '\\SQL2012ClusterWitnessPath' exists and is accessible by the cluster.
Event ID 1069:
Cluster resource 'File Share Witness' of type 'File Share Witness' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Event ID 1205:
The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
These errors showed up every hour on the hour and then suddenly stopped. I tried looking at the cluster.log file but there wasn't anything recorded there. The file share witness shows to be online and my AG did not fail over to another node.
The cluster has read and write permissions to the share. I did not find any error messages about the witness share on the remote server.
I am wondering what caused these series of events to occur?
Thanks.

Hi Kevin Ni,
Thanks for your reply. I have ran the validation test and I have 2 warnings. My environment has 2 nodes and 2 AG's with each node a failover for each AG. So that each node hosts and primary and a secondary of an AG. Both nodes are
on the same subnet. Here are the errors.
- Validate Multiple Subnet Properties
The RegisterAllProvidersIP property causes the network name to register all dependent IP addresses whether they are online or offline. Some DNS servers and clients in multi-subnet (multi-site) environments can identify the IP address that is in their subnet
and attempt connections only to that address. In such environments, it is usually best to set RegistrerAllProvidersIP to 1. This reduces DNS replication delays.
The RegisterAllProvidersIP property for network name 'Name: Listener1' is set to 1. For the current cluster configuration this value should be set to 0.
The RegisterAllProvidersIP property causes the network name to register all dependent IP addresses whether they are online or offline. Some DNS servers and clients in multi-subnet (multi-site) environments can identify the IP address that is in their subnet
and attempt connections only to that address. In such environments, it is usually best to set RegistrerAllProvidersIP to 1. This reduces DNS replication delays.
The RegisterAllProvidersIP property for network name 'Name: Listener2' is set to 1. For the current cluster configuration this value should be set to 0.
- Validate Network Communication
Node ONE.domain.com is reachable from Node TWO.domain.com by multiple communication paths, but each path includes network interface TWO.domain.com - TWO_NIC_Team. This network interface may be a single point of failure for communication within the
cluster. Please verify that this network interface
is highly available or consider adding additional networks or network interfaces to the cluster.
Node TWO.domain.com is reachable from Node ONE.domain.com by multiple communication paths, but each path includes network interface TWO.domain.com - TWO_NIC_Team. This network interface may be a single point of failure for communication within the
cluster. Please verify that this network interface
is highly available or consider adding additional networks or network interfaces to the cluster.
I have node TWO setup with NIC Teaming. Node ONE is also setup with NIC teaming but it also has a second IP address but the second IP address cannot communicate with node TWO.

SQL AlwaysOn Availability Group modifies preferred owner node on move/failover

I have a two node Windows Server 2012 R2 failover cluster with two SQL Server 2012 Enterprise AlwaysOn Availability groups configured. Within the cluster I have configured a preference order on which node a AlwaysOn Availability Group should run.
Get-ClusterGroup -Name "AlwaysOn1" | Set-ClusterOwnerNode -Owners "node1","node2";
Get-ClusterGroup -Name "AlwaysOn2" | Set-ClusterOwnerNode -Owners "node2","node1";
After configuring the preference and retrieving the preference order it is configured as it should be. When I do a failover the preference order is adjusted to first the node it currently runs on and second the other node instead of leaving the preference
order as it was. It doesn't matter if I use the PowerShell "Move-ClusterGroup" or the "ALTER AVAILABILITY GROUP [AlwaysOn1] FAILOVER" T-SQL command. From the cluster log it is shown that the preference is modified after the resource
has been moved.
INFO [GUM] Node 2: executing request locally, gumId:15859, my action: /rcm/gum/SetGroupPreferredOwners, # of updates: 1
INFO [RCM] rcm::RcmGum::SetGroupPreferredOwners(AlwaysOn1,<vector len='2'>
The default groups within the cluster "Available Storage" and "Cluster Group" keep their preference setting when configured (only configured for testing to verify it has to do with the SQL Server Availability Group resource type).
The Windows Server 2012 R2 servers are updated using the latest Windows Updates and SQL 2012 has build number 11.0.5556.
For other cluster groups which I have configured in the past like file, Hyper-V (used for affinity to get VM's which have a lot of traffic between them within the same virtual switch to reduce latency and additional hops. it would be nice if vNext will have
a affinity option for cluster groups besides the anti-affinity within the current version)
I would like to know if someone could explain why the SQL Server Availability Group resource always modifies the group preference. I might understand this with a three node cluster where the first two are running in Synchronous commit mode and the third
one in Asynchronous commit mode.
Thanks in advance,
Dennis van den Akker

Hi Dennis van den Akker,
Do not use the Failover Cluster Manager to manipulate availability groups, for example:
Do not add or remove resources in the clustered service (resource group) for the availability group.
Do not change any availability group properties, such as the possible owners and preferred owners. These properties are set automatically by the availability group.
Do not use the Failover Cluster Manager to move availability groups to different nodes or to fail over availability groups. The Failover Cluster Manager is not aware of the synchronization status of the availability replicas, and doing so can lead to extended
downtime. You must use Transact-SQL or SQL Server Management Studio.
The related KB:
Failover Clustering and AlwaysOn Availability Groups (SQL Server)
https://msdn.microsoft.com/en-us/library/ff929171.aspx
The similar thread:
SQL Server Clustering 2008 (Multi Node)
https://social.msdn.microsoft.com/Forums/sqlserver/zh-CN/c35fe863-3fed-4a6e-830b-b79aa032c198/sql-server-clustering-2008-multi-node?forum=sqldisasterrecovery
Best Regards,
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

Why NT SERVICE\MSSQLSERVER & NT SERVICE\SQLSERVERAGENT are flagged as Windows groups on both sys.syslogins and sys.server_principals

Why NT SERVICE\MSSQLSERVER & NT SERVICE\SQLSERVERAGENT are flagged as Windows groups on both sys.syslogins and sys.server_principals?
The question is relevant, at least, for xp_logininfo, which will return an error if executed with one of these two logins, and with the “members” option:
exec xp_logininfo 'NT SERVICE\MSSQLSERVER', 'members'
Msg 15404, Level 16, State 5, Procedure xp_logininfo, Line 42
Could not obtain information about Windows NT group/user 'NT SERVICE\MSSQLSERVER’, error code 0x8ac.

Thank you all of you for your answers, but my question is rather why those logins are flagged as Windows group (type = ’G’ on sys.server_principals and isntgroup = 1 on sys.syslogins, respectively), if, and your answers point to that, they are NOT Windows
groups.
I think is better to clarify with an example: in order to use xp_logininfo with the “members” option only for Windows group, one will filter logins by “type = ’G’” or by “isntgroup = 1” from either sys.server_principals or sys.syslogins, respectively.
Any of those conditions will NOT filter out the NT SERVICE\MSSQLSERVER & NT SERVICE\SQLSERVERAGENT logins, and xp_logininfo will raise the above error for those, which is normal as they are not groups, and they cannot have members.
Oh, and from one of my systems:
SELECT SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition');
10.50.1600.1   RTM   Enterprise Edition (64-bit)
SELECT name,type,type_desc FROM master.sys.server_principals WHERE name LIKE 'NT SERVICE\%SQL%';
NT SERVICE\SQLSERVERAGENT   G   WINDOWS_GROUP
NT SERVICE\MSSQLSERVER   G   WINDOWS_GROUP
SELECT name,isntgroup FROM master.sys.syslogins WHERE name LIKE 'NT SERVICE\%SQL%';
NT SERVICE\SQLSERVERAGENT   1
NT SERVICE\MSSQLSERVER   1

AlwaysOn Availability groups and unable to backup LOG files

Hi,
I have an issue where we are using AlwaysOn availability groups, however my transaction log backup jobs are not working for any databases that are part of the groups. Databases which are standalone backup fine. I've tried using the builtin maintenance
plans as well as the 'maintenance script' from Ola Hallengren which is my preference.
All the jobs are reporting success even though nothing is bring written to disk, the jobs complete within a few seconds.
If I backup the log files individually myself they work fine. After
going a bit of Googling this appears to have been a known issue that was fixed in CU7, unfortunately I'm running CU9 and still have the issue.
I've tried changing the backup preference to any, as well as primary or secondary only to no avail.
I'm using SQL Server SP1 CU9, 2 nodes in the group. Anyone have any suggestions?
Marcus

Hi,
I understand that you have tried every options in Backup Preferences. I still suggest you enable the “For availability database. Ignore Replica Priority for Backup and Backup on Primary Settings” when you define the maintenance plan and check how it works.
This settings is off by default. And the maintenance plan will detect the availability group's AUTOMATED_BACKUP_PREFERENCE setting when deciding to backup a database or log if that database is defined in the group.
'Given that an availability group's default AUTOMATED_BACKUP_PREFERENCE setting is 'SECONDARY, a maintenance plan, defined and running on a SQL Server instance hosting the primary replica, will NOT backup databases defined in the availability group.
Reference:
http://blogs.msdn.com/b/alwaysonpro/archive/2014/01/02/maintenance-plan-does-not-backup-database-or-log-of-database-that-belongs-to-availability-group.aspx
Hope it helps.
Tracy Cai
TechNet Community Support

SUN Cluster 3.2, Solaris 10, Corrupted IPMP group on one node.

Hello folks,
I recently made a network change on nodename2 to add some resilience to IPMP (adding a second interface but still using a single IP address).
After a reboot, I cannot keep this host from rebooting. For the one minute that it stays up, I do get the following result from scstat that seems to suggest a problem with the IPMP configuration. I rolled back my IPMP change, but it still doesn't seem to register the IPMP group in scstat.
nodename2|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 Path online
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 0 1 Offline
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 nodename2
-- Device Group Status --
Device Group Status
Device group status: jms-ds Online
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
scstat: unexpected error.
I did manage to run scstat on nodename1 while nodename2 was still up between reboots, here is that result (it does not show any IPMP group(s) on nodename2)
nodename1|/#scstat
-- Cluster Nodes --
Node name Status
Cluster node: nodename1 Online
Cluster node: nodename2 Online
-- Cluster Transport Paths --
Endpoint Endpoint Status
Transport path: nodename1:bge3 nodename2:bge3 faulted
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
Device votes: /dev/did/rdsk/d3s2 1 1 Online
-- Device Group Servers --
Device Group Primary Secondary
Device group servers: jms-ds nodename1 -
-- Device Group Status --
Device Group Status
Device group status: jms-ds Degraded
-- Multi-owner Device Groups --
Device Group Online Status
-- IPMP Groups --
Node Name Group Status Adapter Status
IPMP Group: nodename1 sc_ipmp1 Online bge2 Online
IPMP Group: nodename1 sc_ipmp0 Online bge0 Online
-- IPMP Groups in Zones --
Zone Name Group Status Adapter Status
I believe that I should be able to delete the IPMP group for the second node from the cluster and re-add it, but I'm sure about how to go about doing this. I welcome your comments or thoughts on what I can try before rebuilding this node from scratch.
-AG

I was able to restart both sides of the cluster. Now both sides are online, but neither side can access the shared disk.
Lots of warnings. I will keep poking....
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename2
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface ce3; disabling probe-based failure detection on it
Jul 21 10:00:16 in.mpathd[221]: No test address configured on interface bge0; disabling probe-based failure detection on it
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename2: attempting to join cluster.
NOTICE: CMM: Node nodename1 (nodeid: 1, incarnation #: 1279727883) has become reachable.
NOTICE: clcomm: Path nodename2:bge3 - nodename1:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename2: joined cluster.
NOTICE: CCR: Waiting for repository synchronization to finish.
WARNING: CCR: Invalid CCR table : dcs_service_9 cluster global.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
==> WARNING: DCS: Error looking up services table
==> WARNING: DCS: Error initializing service 9 from file
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d22 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename2 console login: obtaining access to all attached disks
starting NetWorker daemons:
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_141444-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: nodename1
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
Booting as part of a cluster
NOTICE: CMM: Node nodename1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node nodename2 (nodeid = 2) with votecount = 1 added.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: clcomm: Adapter bge3 constructed
NOTICE: CMM: Node nodename1: attempting to join cluster.
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 errors during initiation
WARNING: Path nodename1:bge3 - nodename2:bge3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
NOTICE: bge3: link down
NOTICE: bge3: link up 1000Mbps Full-Duplex
NOTICE: CMM: Node nodename2 (nodeid: 2, incarnation #: 1279728026) has become reachable.
NOTICE: clcomm: Path nodename1:bge3 - nodename2:bge3 online
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node nodename1 (nodeid = 1) is up; new incarnation number = 1279727883.
NOTICE: CMM: Node nodename2 (nodeid = 2) is up; new incarnation number = 1279728026.
NOTICE: CMM: Cluster members: nodename1 nodename2.
NOTICE: CMM: node reconfiguration #3 completed.
NOTICE: CMM: Node nodename1: joined cluster.
WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d3s2 with error 2.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],0:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],0:c,raw".
/usr/cluster/bin/scdidadm: Could not stat "../../devices/iscsi/[email protected],1:c,raw" - No such file or directory.
Warning: Path node loaded - "../../devices/iscsi/[email protected],1:c,raw".
/dev/md/rdsk/d26 is clean
Reading ZFS config: done.
NOTICE: iscsi session(6) iqn.1994-12.com.promise.iscsiarray2 online
nodename1 console login: obtaining access to all attached disks
starting NetWorker daemons:
nsrexecd
mount: /dev/md/jms-ds/dsk/d100 is already mounted or /opt/esbshares is busy

Alwayson Availability groups for a WSFC hosting mutliple SQL FCI's

Hello,
I have a question regarding SQL server FCI's and alwayson availability groups.
Today I have a customer running a 2 node WSFC hosting 9 different SQL FCI's.
Plan is to introduce on this setup SQL alwayson availability groups one per SQL FCI.
Per SQL FCI, running on the 2 node WSFC, we are planning to introduce a dedicated
alwayson availability group with 2 replicas. On replica as primary linked with the
FCI and one replica as secondary on 1 extra node added to the WSFC.
Question i have is, is this a supported configuration?
Thanks in advance
Regards
Raf

This is a supported scenario. However, make sure that no FCI node will run a SQL Server instance acting as a replica. This means you cannot install a standalone instance on one of the FCI nodes and make it an AG replica. You also cannot configure another
FCI in the same WSFC to be a secondary replica in an AG. If you intend to configure AG per SQL Server FCI, you need to add another node in the WSFC and install a SQL Server instance on that node. This can be a SQL Server FCI (which means you need to add two
more nodes instead of just one) or a standalone instance. Also, having an FCI in an AG configuration means that you lose the ability to do automatic failover on the AG-level. This is because the FCI takes care of automatic failover on the instance-level.
Edwin Sarmiento SQL Server MVP | Microsoft Certified Master
Blog |
Twitter | LinkedIn
SQL Server High Availability and Disaster Recover Deep Dive Course

Data guard setup for 2 node RAC primary to 2 node RAC standby

Hi All,
I am going to setup data guard for 2 node RAC primary to 2 node RAC standby on Oracle 10.2.0.4. in AIX5L.
Can you please provide the document on the above setup which is having all the steps (details).
Also, the documents on different scenarios like
1) If one node of standby goes down, how the redo logs will be applied. IS there any problem?
2) If both nodes of standby are failed, how to reciver them?
3) If one node of primary fails, is there any issue?
4) If two nodes of primary fails, is there any issue?
Thanks in advance,
Mahi

Have a look at the following location, you may find some similar documents:
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
By
http://www.oraxperts.com

FWSM on 6500 TCP connection issues after crash on primary

I'm experiencing a rather strange issue that has me stumped.
We are running an FWSM on a 6509 with a SUP720. Firmware 3.2(18), in MultiContext Routed Mode, with shared MSFC.
Everything runs fine on this baby most of them time, however occasionally without warning and with no specific pattern the Primary node will fail (as in completely stop responding) and the secondary will takover as active.
Two get the primary up agian, I reset the hw-module and then no failover active on the secondary to return the primary as active. However, after this event, I start to experience strange issues with connectivity. Certain TCP src dst combinations will just not work. Take the following example:
A TCP/1433 connection from Inside IP: 10.3.3.196 to outside IP: 10.252.20.63, logs look like this:
2012-08-07 13:43:13:0868          + 13435          2012-08-07 13:43:09     Local5.Info     192.168.2.7     Aug 07 2012 11:31:19: %FWSM-6-302013: Built outbound TCP connection 145674175523995444 for servers:10.3.3.196/64112 (10.3.3.196/64112) to outside:10.252.20.63/1433 (10.252.20.63/1433)
2012-08-07 13:43:13:0868          + 13436          2012-08-07 13:43:09     Local5.Info     192.168.2.7     Aug 07 2012 11:31:19: %FWSM-6-302014: Teardown TCP connection 145674175523995444 for servers:10.3.3.196/64112 to outside:10.252.20.63/1433 duration 0:00:00 bytes 128 TCP Reset-O
2012-08-07 13:43:13:0868          + 13526          2012-08-07 13:43:09     Local5.Info     192.168.2.7     Aug 07 2012 11:31:19: %FWSM-6-106028: Deny TCP (Connection marked for Deletion) from 10.3.3.196/64112 to 10.252.20.63/1433 flags SYN on interface servers
2012-08-07 13:43:13:0875          + 13670          2012-08-07 13:43:10     Local5.Info     192.168.2.7     Aug 07 2012 11:31:20: %FWSM-6-302013: Built outbound TCP connection 145674175523995445 for servers:10.3.3.196/64112 (10.3.3.196/64112) to outside:10.252.20.63/1433 (10.252.20.63/1433)
2012-08-07 13:43:13:0875          + 13671          2012-08-07 13:43:10     Local5.Info     192.168.2.7     Aug 07 2012 11:31:20: %FWSM-6-302014: Teardown TCP connection 145674175523995445 for servers:10.3.3.196/64112 to outside:10.252.20.63/1433 duration 0:00:00 bytes 124 TCP Reset-O
However I create a specific ACL on the upstream routers interface, to see if I get any matches and the traffic is not even leaving the 6509. I can however ping the remote device without any issues. And I can confirm that the xlate has been built.
This connection was working fine prior to the crash, and the ACL rules are correct and do allow the connection on both the local FWSM and the remote firewall.
Currently my only resolution is to reboot the FWSM on both nodes at the same time so that we have a complete fresh start. This is not ideal!
Anyone know of issues like this? Any suggestions for workarounds or perhaps ways to troubleshoot the reason for the crash?
Thanks!
Craig

Hi Bro
Perhaps, this could be a hardware related issue concerning your Primary FWSM. However, before we can conclude that, could you upgrade your FWSM to the latest image v4.1.7?

Initermittent Connection Timeout at Post-Login when connect to AlwaysOn Availability Group listener

Hi,
I'm using SQL 2014 AlwaysOn Availability Group and my ASP .Net application is connected to the database thru the availability group listener (tcp:[myHAGroup],14330).
Things work most of the time, but recently I started to see the following Connection Timeout exception in Event Log. It looks like under heavy traffic the application can't connect to the availability group listener.
Timestamp: 8/17/2014 10:07:28 AM
Message: System.Data.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Connection Timeout Expired. The timeout period elapsed during the post-login phase. The connection could have timed out while waiting for server to complete the login process and respond; Or it could have timed out while attempting to create multiple active connections. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=2; handshake=0; [Login] initialization=0; authentication=0; [Post-Login] complete=15008; ---> System.ComponentModel.Win32Exception: The wait operation timed out
--- End of inner exception stack trace ---
I've checked SQL Profiler and don't see any memory or CPU peak at that time. Looks like somethings happened during the Post-Login stage, but I couldn't figure out what or why....
I'm using 2 SQL servers (1 primary, 1 secondary): SQL 2014 ENT, 8GB RAM, 8 core 2.67 Xeon X5650.
Application: ASP .Net, framework 4.0.
All servers are VMWare virtualization servers, running on 1 virtual switch, so internal traffic should not be a problem.
Comments/ideas are appreciated.
Eric.

Hi Eric Nguyen,
According to your description and error message, we need to verify if there are some queries/transaction lock your database via SQL trace. If yes,
you have to find out which queries are blocking and rewrite them/run them at other time to avoid blocking other processes.
In addition, you can use performance monitor to track connection pool connections. If you set default connection limit and timeout limit with a small number, the error of “wait operation timed out” will occur. Meanwhile, you also need to monitor disk usage
on SQL Server , and set the auto increment size of your transaction log and database with a fixed size instead of a percentage of the current files due to avoid transaction timeout. For more information, see:
http://stackoverflow.com/questions/7743725/how-to-troubleshoot-intermittent-sql-timeout-errors
There is an similar issue about connection failed from webservers to SQL Server AlwaysOn Listener, you can review it.
http://stackoverflow.com/questions/23416492/connection-timeouts-when-using-multisubnetfailover-true
Regards,
Sofiya Li
Sofiya Li
TechNet Community Support

Switching resource group in 2 node cluster fails

hi,
i configured a 2 node cluster to provide high availability for my oracle DB 9.2.0.7
i have created a resource and named it oracleha-rg,
and i crated later the following resources
oraclelh-rs for logical hostname
hastp-rs for the HA storage resource
oracle-server-rs for oracle resource
and listener-rs for listener
whenever i try to switch the resource group between nodes is gives me the following in dmesg:
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hafoip_stop> for resource <oraclelh-rs>, resource group <oracleha-rg>, node <DB1>, timeout <300> seconds+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource oraclelh-rs status on node DB1 change to R_FM_UNKNOWN+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource oraclelh-rs status msg on node DB1 change to <Stopping>+
+Feb 6 16:17:49 DB1 ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 010.050.033.009:0, remote = 000.000.000.000:0, start = -2, end = 6+
+Feb 6 16:17:49 DB1 ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: aborted 0 connection+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource oraclelh-rs status on node DB1 change to R_FM_OFFLINE+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource oraclelh-rs status msg on node DB1 change to <LogicalHostname offline.>+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hafoip_stop> completed successfully for resource <oraclelh-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <300 seconds>+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource oraclelh-rs state on node DB1 change to R_OFFLINE+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <hastp-rs>, resource group <oracleha-rg>, node <DB1>, timeout <1800> seconds+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource hastp-rs status on node DB1 change to R_FM_UNKNOWN+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource hastp-rs status msg on node DB1 change to <Stopping>+
+Feb 6 16:17:49 DB1 SC[,SUNW.HAStoragePlus:8,oracleha-rg,hastp-rs,hastorageplus_postnet_stop]: [ID 843127 daemon.warning] Extension properties FilesystemMountPoints and GlobalDevicePaths and Zpools are empty.+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <hastp-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <1800 seconds>+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource hastp-rs state on node DB1 change to R_OFFLINE+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource hastp-rs status on node DB1 change to R_FM_OFFLINE+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource hastp-rs status msg on node DB1 change to <>+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group oracleha-rg state on node DB1 change to RG_OFFLINE_START_FAILED+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFFLINE+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <oracleha-rg> on node <DB1> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 447451 daemon.notice] Not attempting to start resource group <oracleha-rg> on node <DB2> because this resource group has already failed to start on this node 2 or more times in the past 3600 seconds+
+Feb 6 16:17:49 DB1 Cluster.RGM.global.rgmd: [ID 674214 daemon.notice] rebalance: no primary node is currently found for resource group <oracleha-rg>.+
+Feb 6 16:19:08 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource hastp-rs disabled.+
+Feb 6 16:19:17 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource oraclelh-rs disabled.+
+Feb 6 16:19:22 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource oracle-rs disabled.+
+Feb 6 16:19:27 DB1 Cluster.RGM.global.rgmd: [ID 603096 daemon.notice] resource listener-rs disabled.+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFF_PENDING_METHODS+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB2 change to RG_OFF_PENDING_METHODS+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <bin/oracle_listener_fini> for resource <listener-rs>, resource group <oracleha-rg>, node <DB1>, timeout <30> seconds+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <bin/oracle_listener_fini> completed successfully for resource <listener-rs>, resource group <oracleha-rg>, node <DB1>, time used: 0% of timeout <30 seconds>+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB1 change to RG_OFFLINE+
+Feb 6 16:19:51 DB1 Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group oracleha-rg state on node DB2 change to RG_OFFLINE+
and the resource group fails to switch...
any help please?

Hi,
this forum is for Oracle Clusterware, not Solaris Cluster. You probably should close this thread and open your question in the corresponding Solaris Cluster forum, to get help.
Regards
Sebastian

Graphical mapping issue, repeated lines are not populating

Hello All,
i have mapping where i need to map source node and its coresponding subelements to target node and its subelemtns based on certains condition, which i did using If condition.
Now my issue is, when mapped source node with condition , conditional field is not in the same node which i am mapping. It is in the parent node of the node which i am mapping. SO when i mapped to target node, only one occurence only its populating in target side even When lines repeated more than once its not populating all lines corespondingly. Please let me know how to populate all line items when i have validation field in parent node and child node and its subelemnts to be mapped to the target node. ( Both parent node and sub node in source and target node are unbounded) .
Thanks in advance.
Regards,
Kalpana

Thanks for your reply, My source structre looks like below.
I have to map rows node and its coresponding fields to target node Lines which is unbounded and to its subelements when collection name = 'MATGRP_COLLN'. I did it using if condition but , only one row tis getting populated in target LIne node. next repeated row values are not populating in target side.
- <extensions>
+ <collection name="REGION_COLLN" />
+ <collection name="DIVISION_COLLN" />
+ <collection name="DTL_PRCNG_COLLN" />
+ <collection name="COUNTRY_COLLN">
- <collection name="MATGRP_COLLN">
- <row>
- <fields>
<OBJECTID>-2147483540</OBJECTID>
<UNIQUE_DOC_NAME>-21474835401248121690973</UNIQUE_DOC_NAME>
<DISPLAY_NAME />
<MATGRP_ITEM_NO>1</MATGRP_ITEM_NO>
<MATGRP_NAME>10100000</MATGRP_NAME>
</fields>
</row>
- <row>
- <fields>
<OBJECTID>-2147483539</OBJECTID>
<UNIQUE_DOC_NAME>-21474835391248121706160</UNIQUE_DOC_NAME>
<DISPLAY_NAME />
<MATGRP_ITEM_NO>2</MATGRP_ITEM_NO>
<MATGRP_NAME>10101500</MATGRP_NAME>
</fields>
</row>
</collection>
</extensions>
</object>
</objects>
</fcidataexport>

The specified nodes are not clusterable

Hello,
I am trying to install RAC 10g,during the installation , i got error saying the specific nodes are not clusterable, i have checked below things on both nodes, node reachablity and node connectivity are successful.
ssh rac1 date
ssh rac1-priv date
ssh rac2 date
ssh rac2-priv date
Could you please help me, how to resolve this issue.
Thanks
Nitya
Edited by: user8773474 on Aug 28, 2009 8:32 AM

Hello Andy,
I searched the link to reward you, but i couldn't
Really thanks for you quick response,
Right now i can say helpful, i will download and try ,if it is working , then i will update you as correct.
Best Regards
Nitya

PPOME ,pers no,EE group,EE subgroup,pers area not appear in Basic data tab?

Hi Friends,
                    I am new to SAP HR. I am trying to assign a person for a position in tcode PPOME. After that I could not see Pers No,Name,EE group,EE subgroup,Pers area details are not appear in Basic data tab. But for other persons which are all exist already in system ,I can see pers no,name,EE group,EE subgroup,pers area details. Could you advice what will be the issue?. Thnak you.
Regards,
Hock Teck..

Hi Rajesh / Thomaselsy,
                                    Thanks for your response. Pls see my below output,
             1.    Plogi Orga value is 'X'.
            2.I create person in PA with pers area ,ee group,ee sub group details and i can see all these details in PA0001 table. I did not create pers area, ee group ,ee subgroup for position ids. I wanted to see those pers area,ee group,ee subgroup detail, from pa0001, in PPOME under basic data tab.
           3.Whether I assign position id to a person in PA40 or assign a person in PPOME to a newly created position id,in both cases I can see details are updated in both PA0001 and HRP1001 tables correctly.
          4.I also tried to assing a position id to a person in PA40 and then through PPOME i tried to assign new position id for that same person. Then I can see new position id got replaced in PA40 properly.
          But when I double click on person in PPOME, I could not see pers no,name ,ee group,ee subgroup,pers area under basic data tab..Pls let me know how to proceed further..
Regards,
Hock Teck.

AlwaysOn AG groups issue, both node are primary now

Similar Messages

Maybe you are looking for