Degraded Health Sets
Hi,
I am currently running a 2013 CU2 DAG with 2 database and 2 cas servers. SCOM is reporting the following but i can find very little info on it;
Alert: Health Set unhealthy
Source: <server name> - Outlook.Protocol
Last modified by: System
Last modified time: 4/2/2014 3:49:58 PM
Alert description: EMSMDB.DoRpc(Logon) step of OutlookRpcDeepTestProbe/<database name> has failed against <server name> proxying to <server name> for HealthMailboxb63d235bb56b428ebf56ea594d3ca0c7@CEOSMTPServer.
Latency: 00:00:00.0520000
ActivityContext: I32:ADS.C[Apollo]=1;F:ADS.AL[Apollo]=3.3585;I32:ADR.C[Apollo]=1;F:ADR.AL[Apollo]=3.0093;I32:ADS.C[Razor]=2;F:ADS.AL[Razor]=2.0185
Outline: [50] EMSMDB.Connect(); [1][FAILED!] EMSMDB.DoRpc(Logon); Likely root cause: Momt
Details:
Error: Error returned in LogonCallResult. Error code = WrongServer (0x00000478)
Log: Mailbox logon verification
EMSMDB.Connect()
Task produced output:
- TaskStarted = 2/04/2014 3:49:25 PM
- TaskFinished = 2/04/2014 3:49:25 PM
- ErrorDetails =
- RespondingRpcClientAccessServerVersion = 15.0.712.4012
- Latency = 00:00:00.0505291
- ActivityContext = I32:ADS.C[Apollo]=1;F:ADS.AL[Apollo]=3.3585;I32:ADR.C[Apollo]=1;F:ADR.AL[Apollo]=3.0093;I32:ADS.C[Razor]=2;F:ADS.AL[Razor]=2.0185
EMSMDB.Connect() completed successfully.
EMSMDB.DoRpc(Logon)
Task produced output:
- TaskStarted = 2/04/2014 3:49:25 PM
- TaskFinished = 2/04/2014 3:49:25 PM
- Exception = Microsoft.Exchange.RpcClientAccess.RopExecutionException: Error returned in LogonCallResult. Error code = WrongServer (0x00000478)
- ErrorDetails =
- Latency = 00:00:00.0010381
- ActivityContext = I32:ADS.C[Apollo]=1;F:ADS.AL[Apollo]=3.3585;I32:ADR.C[Apollo]=1;F:ADR.AL[Apollo]=3.0093;I32:ADS.C[Razor]=2
Any help would be greatly appreciated.
Thanks
Hi,
Please run the following command and post the output:
Get-ServerHealth -Identity Servername -HealthSet Outlook.Protocol
In addition, I recommend you run "test-mapiconnectivity" and check event viewer on exchange server.
http://technet.microsoft.com/en-us/library/bb123681(v=exchg.150).aspx
Use the Test-MapiConnectivity cmdlet to verify server functionality by logging on to the mailbox that you specify. If you don't specify a mailbox, the cmdlet logs on to the SystemMailbox on the database that you specify.
Thanks.
Niko Cheng
TechNet Community Support
Similar Messages
-
Performance degradation after setting filesystemio_option=setall from none.
Hi All,
We have facing performance degradation after setting filesystemio_option=setall from none on my two servers as mentioned below.
Red Hat Enterprise Linux AS release 4 (Nahant Update 7) 2.6.9 55.ELhugemem (32-bit)
Red Hat Enterprise Linux Server release 5.2 (Tikanga) 2.6.18 92.1.10.el5 (64-bit)
We are seeing lots of Disk I/O happening. We expected "*filesystemio_option=setall* " will improve performance but it is degrading. We getting slowness complains.
Please let me know do we need to set somethign else along with this ...like any otimizer parameter( e.g. optimizer_index_cost_adj, optimizer_index_caching).
Please help.Hi Suraj,
<speculation>
You switched filesystemio_options to setall from none, so, the most likely reason for performance degradation after switching to setall is the implementation of directio. Direct I/O will skip the filesystem buffer cache, and and allow Oracle to read directly from disk to the database buffer cache. However, on a system where direct I/O is not implemented, which is what you had until you recently messed with that parameter, it's likely that you had an undersized database buffer cache, but that was ok, because many (most) of the physical I/Os your database was doing, were actually being serviced by the O/S filesystem buffer cache. But, you introduced direct I/O, and wiped out the ability of the O/S to service any physical I/Os from filesystem buffer cache. This means that every cache miss on the database buffer cache, turns into a real, physical, spin-the-disk, move-the-drive-head, physical I/O. And, you are suffering the performance consequences.
</speculation>
Ok, end of speculation. Now, assuming that what I've outlined above is actually going on, what to do? Why is direct I/O lower performing than buffered, non-direct I/O? Shouldn't it's performance be superior?
Well, when you have an established system that's using buffered I/O, and you switch to direct I/O, you almost always will have to increase the size of the database buffer cache. The problem is that you took a huge chunk of memory away from the the O/S, that it was using to buffer your I/Os and avoid physical I/O. So, now, you need to make up for it, by increasing the size of the database buffer cache. You can do this, without buying more memory for the box, because the O/S is no longer going to need to use so much memory for filesystem buffers.
So, what to do? Is it worth switching? Well, on balance, it makes sense to use direct I/O, and give Oracle a larger database buffer cache, for the simple fact that (particularly on a server that's dedicated to being an Oracle database server), Oracle has far more sophisticated caching algorithms, and a better understanding of the various types of data being cached, and so should be able to make more efficient use of the memory, than the (relatively) brain dead caching algorithms of the kernel and filesystem mechanisms.
But, once again, it all comes down to this:
What problem are you trying to solve? Did you have any I/O related issues? Do you have any compelling reason to implement direct I/O? Rule #1 is "if it ain't broke, don't fix it." Did you just violate rule #1? :-)
Finally, since you're on Linux, you can use the 'free' command to see how much memory is on the box, how much is free, and how much is dedicated to filesystem cache buffers. This response is already pretty long, so, I'm not going to get into details, however, if you're not familiar with the command, the results could be misleading. Read the man page, and try to be clear about understanding it before you make any assumptions about the output.
Hope that helps,
-Mark -
Alert: Health Set unhealthy - Clustering
We have SCOM 2012 R2 setup to monitor our Exchange 2013 CU5 enviroment and we have gotten this error message about our Clustering going in to an unhealthy state a couple of times. We have checked the FSW and everything seems OK on its end. I
cannot find much out there on this message, so any help would be greatly appreciated:
Alert: Health Set unhealthy
Source: EXCHANGE04 - Clustering
Path: EXCHANGE04.company.com;EXCHANGE04.company.com
Last modified by: System
Last modified time: 8/24/2014 1:36:35 PM Alert description: The Cluster Group has not been healthy for 7200 minutes. The most recent probe failure message is: Check 'Microsoft.Exchange.Monitoring.QuorumGroupCheck' thrown an Exception!
Exception - Microsoft.Exchange.Monitoring.ReplicationCheckFailedException: QuorumGroup has failed. Specific error is: Quorum resource 'Cluster Group' is not online on server 'exchange06'. Database availability group 'exchDAG' might not be reachable or may have
lost redundancy. Error:
File Share Witness (\\FSW01.company.com\exchDAG.company.com): Offline is offline. Please verify that the Cluster service is running on the server.
at Microsoft.Exchange.Monitoring.ReplicationCheck.Fail(LocalizedString error)
at Microsoft.Exchange.Monitoring.QuorumGroupCheck.RunCheck()
at Microsoft.Exchange.Monitoring.DagMemberCheck.InternalRun()
at Microsoft.Exchange.Monitoring.ReplicationCheck.Run()
at Microsoft.Exchange.Monitoring.ActiveMonitoring.HighAvailability.Probes.ReplicationHealthChecksProbeBase.RunReplicationCheck(Type checkType) Check 'Microsoft.Exchange.Monitoring.QuorumGroupCheck' did not Pass!
Detail Message - Quorum resource 'Cluster Group' is not online on server 'exchange06'. Database availability group 'exchDAG' might not be reachable or may have lost redundancy. Error:
File Share Witness (\\FSW01.company.com\exchDAG.company.com): Offline is offline. Please verify that the Cluster service is running on the server.
To add some additional information, when I look in Failover cluster manager this is what I see. I know when we setup the servers the correct FSW information was being displayed.Hi,
According to the error message, "Offline is offline. Please verify that the Cluster service is running on the server.",
I suggest double check whether the Cluster service is running as well. If not, please restart the service manually to verify whether this issue exists.
Please also refer the blog below to double check whether the FSW online:
Verifying the file share witness server / directory in use for Exchange 2010
http://blogs.technet.com/b/timmcmic/archive/2012/03/12/verifying-the-file-share-witness-server-directory-in-use-for-exchange-2010.aspx
If there is nothing abnormal on the Exchange server, it seems an issue on the SCOM side. Please contact SCOM Forum for help so that you can get more professional suggestions. For your convenience:
http://social.technet.microsoft.com/Forums/systemcenter/en-US/home?category=systemcenteroperationsmanager
Thanks
Mavis
Mavis Huang
TechNet Community Support -
Exchange 2013 CU2, Alert for OWA Health set unhealthy from SCOM 2012
I am facing issue in Exchange 2013 CU2, I got this alert from SCOM 2012 atleast 5-6 times a day, OWA health set is unhealthy, I have done all the steps mentioned in this web link. Authentication type for OWA Virtual directory is integrated windows and Basic.
I have 2 CAS servers, and this alert generated from both of them.
http://technet.microsoft.com/en-us/library/ms.exch.scom.OWA(EXCHG.150).aspx?v=15.0.712.24
Alert: Health Set unhealthy
Source: EX-CAS - OWA
Path: EX-CAS;EX-CAS
Last modified by: System
Last modified time: 1/5/2014 8:15:08 PM
Alert description: Outlook Web Access logon is failing on ClientAccess server EX-CAS.
Availability has dropped to 0%. You can find protocol level traces for the failures on C:\Program Files\Microsoft\Exchange Server\V15\Logging\Monitoring\OWA\ClientAccessProbe.
Incident start time: 1/6/2014 4:05:08 AM
Last failed result:
Failing Component - Owa
Failure Reason - CafeFailure
Exception:
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> Microsoft.Exchange.Net.MonitoringWebClient.ScenarioException:
Microsoft.Exchange.Net.MonitoringWebClient.ScenarioException:
Failure source: Owa
Failure reason: CafeFailure
Failing component:Owa
Exception hint: CafeErrorPage: CafeFailure Unauthorized Inner exception: Microsoft.Exchange.Net.MonitoringWebClient.CafeErrorPageException
ErrorPageFailureReason: CafeFailure, RequestFailureContext: FailurePoint=FrontEnd, HttpStatusCode=401, Error=Unauthorized, Details=, HttpProxySubErrorCode=, WebExceptionStatus=
Microsoft.Exchange.Net.MonitoringWebClient.CafeErrorPageException: An error occurred on the Client Access server while processing the request
WebExceptionStatus: Success
GET https://localhost/owa/ HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; MSEXCHMON; ACTIVEMONITORING; OWACTP)
Accept: */*
Cache-Control: no-cache
X-OWA-ActionName: Monitoring
Cookie:
HTTP/1.1 401 Unauthorized
request-id: 211474d2-a43e-4fab-8038-3aab35353568
X-FailureContext: FrontEnd;401;VW5hdXRob3JpemVk;;;
Server: Microsoft-IIS/7.5
WWW-Authenticate: Negotiate,NTLM,Basic realm="localhost"
X-Powered-By: ASP.NET
X-FEServer: EX-CAS
Date: Mon, 06 Jan 2014 04:14:47 GMT
Content-Length: 0
Response time: 0s
---> Microsoft.Exchange.Net.MonitoringWebClient.CafeErrorPageException: Microsoft.Exchange.Net.MonitoringWebClient.CafeErrorPageException
ErrorPageFailureReason: CafeFailure, RequestFailureContext: FailurePoint=FrontEnd, HttpStatusCode=401, Error=Unauthorized, Details=, HttpProxySubErrorCode=, WebExceptionStatus=
Microsoft.Exchange.Net.MonitoringWebClient.CafeErrorPageException: An error occurred on the Client Access server while processing the request
WebExceptionStatus: Success
GET https://localhost/owa/ HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; MSEXCHMON; ACTIVEMONITORING; OWACTP)
Accept: */*
Cache-Control: no-cache
X-OWA-ActionName: Monitoring
Cookie:
HTTP/1.1 401 Unauthorized
request-id: 211474d2-a43e-4fab-8038-3aab35353568
X-FailureContext: FrontEnd;401;VW5hdXRob3JpemVk;;;
Server: Microsoft-IIS/7.5
WWW-Authenticate: Negotiate,NTLM,Basic realm="localhost"
X-Powered-By: ASP.NET
X-FEServer: EX-CAS
Date: Mon, 06 Jan 2014 04:14:47 GMT
Content-Length: 0
Response time: 0s
--- End of inner exception stack trace ---
at Microsoft.Exchange.Net.MonitoringWebClient.BaseExceptionAnalyzer.Analyze(TestId currentTestStep, HttpWebRequestWrapper request, HttpWebResponseWrapper response, Exception exception, Action`1 trackingDelegate)
at Microsoft.Exchange.Net.MonitoringWebClient.HttpSession.AnalyzeResponse[T](HttpWebRequestWrapper request, HttpWebResponseWrapper response, Exception exception, HttpStatusCode[] expectedStatusCodes, Func`2 processResponse)
at Microsoft.Exchange.Net.MonitoringWebClient.HttpSession.EndSend[T](IAsyncResult result, HttpStatusCode[] expectedStatusCodes, Func`2 processResponse, Boolean fireResponseReceivedEvent)
at Microsoft.Exchange.Net.MonitoringWebClient.HttpSession.EndGet[T](IAsyncResult result, HttpStatusCode[] expectedStatusCodes, Func`2 processResponse)
at Microsoft.Exchange.Net.MonitoringWebClient.Authenticate.AuthenticationResponseReceived(IAsyncResult result)
--- End of inner exception stack trace ---
at Microsoft.Exchange.Net.MonitoringWebClient.BaseTestStep.EndExecute(IAsyncResult result)
at Microsoft.Exchange.Net.MonitoringWebClient.Owa.OwaLogin.AuthenticationCompleted(IAsyncResult result)
--- End of inner exception stack trace ---
at Microsoft.Exchange.Net.MonitoringWebClient.BaseTestStep.EndExecute(IAsyncResult result)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Bool
States of all monitors within the health set:
Note: Data may be stale. To get current data, run: Get-ServerHealth -Identity 'EX-CAS' -HealthSet 'OWA'
State
Name
TargetResource HealthSet
AlertValue ServerComponent
NotApplicable OwaCtpMonitor
OWA
Unhealthy None
States of all health sets:
Note: Data may be stale. To get current data, run: Get-HealthReport -Identity 'EX-CAS'
State
HealthSet
AlertValue LastTransitionTime
MonitorCount
NotApplicable ActiveSync
Healthy 1/3/2014 5:21:13 AM
2
NotApplicable AD
Healthy 11/24/2013 6:54:18 AM
10
NotApplicable ECP
Healthy 1/5/2014 3:03:05 AM
1
Online
Autodiscover.Proxy
Healthy 11/20/2013 10:06:37 AM
1
NotApplicable Autodiscover
Healthy 1/3/2014 10:18:17 PM
2
Online
ActiveSync.Proxy
Healthy 11/20/2013 10:06:37 AM
1
Online
ECP.Proxy
Healthy
11/21/2013 6:16:08 PM 4
Online
EWS.Proxy
Healthy 11/20/2013 10:06:37 AM
1
Online
OutlookMapi.Proxy
Healthy 11/24/2013 6:54:28 AM
4
Online
OAB.Proxy
Healthy 11/19/2013 7:14:34 PM
1
Online
OWA.Proxy
Healthy 11/20/2013 10:06:37 AM
2
NotApplicable EDS
Healthy 1/3/2014 5:19:56 AM
10
Online
RPS.Proxy
Healthy 1/3/2014 5:21:27 AM
13
Online
RWS.Proxy Healthy
1/3/2014 5:20:09 AM 10
Online
Outlook.Proxy
Healthy 1/3/2014 5:21:12 AM
4
NotApplicable EWS
Healthy 1/3/2014 10:18:17 PM
2
Online
FrontendTransport
Healthy 1/5/2014 3:47:09 AM
11
Online
HubTransport
Healthy 1/5/2014 3:47:09 AM
29
NotApplicable Monitoring
Unhealthy 1/5/2014 4:05:57 AM
9
NotApplicable DataProtection
Healthy 1/3/2014 5:25:42 AM
1
NotApplicable Network Healthy
1/4/2014 1:51:16 PM 1
NotApplicable OWA
Unhealthy 1/5/2014 8:05:08 PM
1
NotApplicable FIPS
Healthy 1/3/2014 5:21:12 AM
3
Online
Transport
Healthy 1/5/2014 4:11:00 AM
9
NotApplicable RPS
Healthy 11/20/2013 10:07:12 AM
2
NotApplicable Compliance
Healthy 11/20/2013 10:08:10 AM
2
NotApplicable Outlook
Healthy 11/21/2013 6:12:54 PM
2
Online
UM.CallRouter
Healthy 1/5/2014 3:47:10 AM
7
NotApplicable UserThrottling
Healthy 1/5/2014 4:16:42 AM
7
NotApplicable Search
Healthy
11/24/2013 6:55:06 AM 9
NotApplicable AntiSpam
Healthy 1/3/2014 5:16:43 AM
3
NotApplicable Security
Healthy 1/3/2014 5:19:28 AM
3
NotApplicable IMAP.Protocol
Healthy 1/3/2014 5:21:14 AM
3
NotApplicable Datamining
Healthy 1/3/2014 5:18:34 AM
3
NotApplicable Provisioning
Healthy 1/3/2014 5:19:56 AM
3
NotApplicable POP.Protocol
Healthy 1/3/2014 5:20:44 AM
3
NotApplicable Outlook.Protocol
Healthy 1/3/2014 5:19:46 AM
3
NotApplicable ProcessIsolation
Healthy 1/3/2014 5:19:26 AM
9
NotApplicable Store
Healthy 1/3/2014 5:20:38 AM
6
NotApplicable TransportSync
Healthy 11/24/2013 6:53:09 AM
3
NotApplicable MailboxTransport
Healthy 1/3/2014 5:21:11 AM
6
NotApplicable EventAssistants
Healthy 11/21/2013 6:22:01 PM
2
NotApplicable MRS
Healthy 1/3/2014 5:20:29 AM
3
NotApplicable MessageTracing
Healthy 1/3/2014 5:18:15 AM
3
NotApplicable CentralAdmin
Healthy 1/3/2014 5:17:25 AM
3
NotApplicable UM.Protocol
Healthy 1/3/2014 5:17:08 AM
3
NotApplicable Autodiscover.Protocol
Healthy 1/3/2014 5:17:13 AM
3
NotApplicable OAB
Healthy 1/3/2014 5:20:51 AM
3
NotApplicable OWA.Protocol
Healthy 1/3/2014 5:20:52 AM
3
NotApplicable Calendaring
Healthy 11/24/2013 6:56:59 AM
3
NotApplicable PushNotifications.Protocol
Healthy 11/21/2013 6:16:05 PM
3
NotApplicable EWS.Protocol
Healthy 1/3/2014 5:19:07 AM
3
NotApplicable ActiveSync.Protocol
Healthy
1/3/2014 5:20:16 AM 3
NotApplicable RemoteMonitoring
Healthy 1/5/2014 3:47:09 AM
3
Any solution for this alert, how to rectify it, but OWA is running perfect for all users.Hi,
Sorry for the late reply.
Do we have Exchange 2010 coexistence?
If it is the case, I know the following known issue:
Release Notes for Exchange 2013
http://technet.microsoft.com/en-us/library/jj150489%28v=exchg.150%29.aspx
Please note the "Exchange 2010 coexistence" session.
If it is not related to our problem, please check the IIS log.
If there is any detailed error code, like 401.1, 401.2, please let me know.
Hope it is helpful
Thanks
Mavis
If you have feedback for TechNet Subscriber Support, contact
[email protected]
Mavis Huang
TechNet Community Support -
Health set components seems to be unhealthy
Hi,
In my environment health sets components seems to be unhealthy but there is no problem with user side
Below are the components
HealthSet AlertValue
MailboxTransport Unhealthy
HubTransport Unhealthy
ECP Unhealthy
Search Unhealthy
Store Unhealthy
MSExchangeCertif... Disabled
DataProtection Unhealthy
RPS Unhealthy
RWS Unhealthy
Compliance Unhealthy
Outlook Unhealthy
Can somebody help me through this please.Hello,
I think you can combine the heltht reports with the application log?
Is there any warning or error reprot in it about these unhealty items. If no, I think we can safely ingore these errors.
Thanks,
Simon Wu
TechNet Community Support -
FrontendTransport health set unhealthy (OnPremisesSmtpClientSubmissionMonitor)
FrontendTransport health set unhealthy (OnPremisesSmtpClientSubmissionMonitor) - The client submission probe failed 3 times over 15 minutes.
Seems like these alerts have started comming for some of the servers, where mailbox and CAS role is installed together. when i cehcked the queue, all seems to be fine. Performed the below mentioned steps, but the issue didn't fixed:
1. invoke-monitoringprobe" command doesn't work.
2. Have restarted "health manager service" didn't work.
Still the alert value is in uhealthy state, have anyone come across the same issue, if so, can you share what are the steps that we have take?
Your answers are much appreciated!Hi,
Please check the Monitor Result and Probe Result in the following path and see if there is any related message.
Event Viewer\Applications and Services Logs\Microsoft\Exchange\ActiveMonitoring\ProbeResult( or MonitorResult).
Based on your description, everthing works well except this alert. However, there is a way to hide the alert by overriding the monitor using the command below:
Add-GlobalMonitoringOverride -Identity "FrontendTransport\OnPremisesSmtpClientSubmissionMonitor" -PropertyName Enabled -PropertyValue 0 -ItemType Monitor -ApplyVersion "version"
Hope this is helpful to you.
Best regards,
Belinda Ma
TechNet Community Support -
Degraded RAID set (mirror)
I am running a pair of 2 GB external drives in RAID 1 (mirroring) using the OSX Disk Utility. Recently I noticed that the set shows it's RAID status as Degraded and one of the two drives is indicated as "Missing," which keeps the "rebuild raid set" button grey. However, I can verify each of them separately and they appear to be okay. Only one of the drives appears to be partitioned correctly; the "missing" drive simply shows a RAID slice for the entire 2 GB.
I would recreate the mirror set, except that I don't have anywhere to store the 1.3 GB still on the good drive, and I believe I cannot create the set again without erasing the contents. (I back up to the RAID set as well as use it for un-backed up storage, which I think is safe as long as the RAID set is working.)
Any ideas how to get the "missing drive" to reappear, so the system can rebuild the set? Or any other ideas to get out of this problem? Thanks.It doesn't change the problem, but obviously I meant TB (not GB)
-
Hello Support Community,
I have a 4 bay RAID set that all of a sudden is showing as degraded. I have two pairs of disks each striped, and then have those two pairs mirrored. Not sure if this is a correct RAID format, but it is what it is right now. So both striped sets say online, with no problems, but my mirrored set of those two pairs shows that it is degraded. See below.
Any thoughts as to what might be happening? I have the options set to automatically rebuild the RAID. How do I know this is happening? How long should I expect this to take? It's a 4TB raid in it's current config, and there is about 3.7TB of data on this RAID. I have everything backed up in two other locations. Am I better off starting from scratch? Or should I just let this thing run for 2 weeks and see what happens? Any help would be greatly appreciated.
Thanks
DaveNo, I was doing all of this in disk utility. I left the arrary running overnight, and it's back online now. Not sure what got screwed up.
Thanks -
Hi All,
Dual Core 2.0 running 10.4.2. I set up a raid set and one of the drives almost immediately reported a SMART failure. So I replaced it. No problem, no downtime, even. The machine functioned fine with just the one drive.
Now...
Bought a replacement second drive. Added it to the RAID set, rebuilt it. No problem, but the RAID set still reports as degraded. I cannot delete the damaged drive because I don't have it anymore (hindsight is 20/20). How can I delete the non-existent drive?
Thanks,
Danny
Dualcore 2.0 G5 Mac OS X (10.4.2)Yeah, but if you've only got 2 slices and one of them
is out to lunch, well, it's not rocket science to
know what your risk is ...
Well, actually, one failed and has been replaced. It was pretty easy to add the new drive to the RAID set (just drag and drop into the RAID and then rebuild it). The machine worked ave without the failed drive hence...
What worries me is that I don't know for sure
if the RAID array is actually working as designed and
I haven't found a definitive statement about what
exactly 'degraded status' means. Does it mean
'not working as well as it should but still doing the
job' or does it mean 'this RAID array isn't doing
diddly and sometime soon you're gonna be up the
creek.'
As you could see, the RAID was working very well. One drive died and the machine continued working fine, so there was NO downtime (except to switch off the G5 and yank out the failed drive). I'm guessing degraded means that one of the registered slices is missing. But if you have another registered slice, then you're fine.
More specifically, if my primary drive actually does
die, is there a complete dupe on the second drive in
the array or not?
As in my example, that's exactly right. The hard drives are exact copies.
I guess I should go ahead and do a hard backup on an
external drive and then just pull out the primary
drive (it's hot-swappable) and see what happens.
Yes, RAID is not back-up. If you have a corrupted directory structure due to software, then you would have to rely on a back-up. But it's nice to have both, really, as the most common problem with disks is hardware failure and this minimizes problems involved.
To tell you the truth, though, I'm not sure I'd bother again. I find that using psync (available from bombich.com under the Carbon Copy cloner section) works fine. It creates a fully working system disk and can be set to clone the start-up drive daily. If the main disk fails, most people won't even notice that they're starting up from the second one.
So back to the question: how to remove this degraded status? -
I have or should I say had 2 250GB mirrored raid set. They would no longer boot I bought a new drive (same size) and attempted to rebuild the set. This went fine for a time but then failed. Now I can no longer see the raidset. I have tried booting from the server CD and it will stay booted just long enought to show me the raid set is gone or to open terminal then it shuts itself down.
I desperately need the data on this server. My most recent backup to tape was the end of January.
G5 Server Mac OS X (10.3.9)Both hard drives were ruined by a power surge.
-
RAID degraded - RedundancyScrub error message
Hi all,
Even though my problem is not on an X-Serve box but on a MacPro (2008), I run OSX Server 10.5.4 and have not found anything in the MacPro or Server OS forums, hence my post here in the hope of an experienced admin being able to help here.
After having a new RAID 5 setup on 4x500GB Seagate SATA drives and Apple's RAID card running fine for over 2 months, I ran the 10.5.4 update (after much deliberation on the pro's and con's) and promptly had this issue. Only found out a few days after running the update though, and after going through all logs, I could pin the problem to exactly the time of immediately after the OS update. I have searched high and low to find anything in the RAID and OS Server forums, but haven't found anything there to resolve this error message with or without cli:
The "RedundancyScrub" command could not be executed. (The request failed because a volume service is currently running)
Yes, have tried to verify RAID, only to get an error popup when trying to verify...
Could not verify the RAID set "RS1". there was a problem communicating with the device.
Since it is basically still running ok, but degraded, I need to do something, even if it means a wipe and re-install of the entire Server OS and RAID setup. I have had a few issues with Leopard anyway, and was thinking of doing a final clean install when .5 update is out that may tidy up a few more issues. But, any help in the meantime from an experienced RAID or X-Serve admin would be much appreciated.
Thanks
ChrisI had to go on to Apple's list to find this, sorry about the length...it fixed my problem
A client of mine accidentally removed a drive from his RAID-5
array on an Intel XServe. He reinserted the drive and all three 1
TB drives show up as good, but the set is showing as degraded.
The log shows Degraded RAID Set R0-1 No Spare Available for
Rebuild.
When I attempt to run a verify, it says "Could not verify the
RAIDset R0-1, There was a problem communicating with the device."
After that the log shows "The redundancy Scrub command could not
be executed. The request failed because a volume service is
currently running."
Googling this brings up a similar problem that two other people
are having, but no one has a solution. Have any of you run into
this?
Hi, Mike,
If you just replaced Apple Drive modules, the new one still will
not be as spare drive for your raid set. You have to tell RAID
card a spare drive came.
You can do for this both CLI and GUI.
CLI:
- - use 'raidutil' command.
$ sudo raidutil modify drive --addglobalspare -d <DriveBayNumber>.
For more information, see raidutil(8).
GUI:
- - Launch /Applications/Utilities/RAID Utility.app
Select 'Make Spare...' from menu 'RAID' after you selected the
new drive.
Cheers,
- -takanori
I think you misunderstand what happened, perhaps I should have been
more specific. T
he person took a drive out of a working RAID-5 while it was
running. Then he replaced it. He pulled the drive from the wrong
XServe in a rack, intending to pull a bad drive from a different
server.
Now we are unable to clear the "Degraded" message that it has. My
assumption would be that it should have simply gone into a degraded
state and then rebuilt itself automatically.
Perhaps the drive should be pulled again and formatted, and then
inserted and added as a spare?
Hi, Mike,
Ok. I got your condition.
I had same situation couple weeks ago. I've also thought when drive
module of raid set set back, Apple RAID Card would start rebuilding
immediately . But it would not. So you have to mark spare to the
drive module which was pulled out accidentally and set back same bay
of xserve. You don't have to format it again. You can mark it spare
drive right way.
In my case, Apple RAID Card said it was a part of another raid set
when I set the drive module back after my staff pulled it out
carelessly. So Apple RAID Card never start rebuilding using with a
part of raid set. I think it is good as fool-safe mechanism.
The way of marking spare is as same as what I wrote above. If you
decide to use CLI, you may have to use another subcommand of
raidutil before making it spare. I repaired my raid set of xserve
with RAID Utility.app just select 'Make Spare...' from menu 'RAID'.
Careful and cheers,
- -takanori
Thank you Takanori, that was the thing to do. The RAID is rebuilding
now. -
Suggestion for fixing broken raid set
Jesus. Again. Twice in 6 months. Raid set failure.
2:58pm today : Drive 3:50014ee2aede46eb missing - Previous drive status was inuse
2:59pm today : Degraded RAID set RS1
2:59pm today : Degraded RAID set RS1 - No spare available for rebuild
Jesus...
After launching the Raid Utility, I notice that one drive is actually missing from the drive bays.. Its just gone, and I have not done anything to it. This happened last December as well. Hard booting the drive (pull out, push back in) worked last time to get the drive online, but jesus. Twice? I should maybe replace the drive? I am using Apple Raid Card, people say it turns to be pretty strict about the drives state, but why the **** it keeps disappearing from the system?!
I was already on the phone with one Apple consultant about this, and I think everything is pretty OK. I have good backups, and gladly, the OS RAID set is ok. Only our accounts and work files were in there, and all is secured. But this is really stressful.. Feels like I can't trust these drives one bit. And they are good drives, standard hardware what comes with Mac Pro.
Just when I was starting to think that everything is finally working smoothly..
Any recommendations about how to act now. I know what I have to do but, it would be encouraging if I would get some steps to how to fix it. Working order etc.
Everything works tho, taking one separate set of backups at the moment just in case. I just need to get my act together and fix it. God I am annoyed tho.
Good weekend to everyone tho. Comments are appreciated.Yeah, I already ordered a new drive.
I can't do any tests for the bad drive cos it just disappeared from the system totally. I guess it could come online if I hard boot it again (like when the raid set broke last december), but I don't feel like doing it before I get the new drive. Need to analyze it on another computer.
The consultant I talked with earlier mentioned that the RAID card is pretty strict about the condition of the drive. But I would like to know if that is why the drive keeps disappearing, if it really totally ejects it from the system . I heard that some cases the the Raid Utility just shows the red light what indicates the drive state if there is a problem, but for me the drive is just gone totally.
Hope that the new drive arrives soon. Will the Raid rebuild itself if I insert the new drive and mark it as global spare ? That's what I understood from reading the Raid Utility manual. -
RWS.Proxy and ECP.Proxy health checks, localhost, and SSL
RWS.Proxy and ECP.Proxy health sets are both failing. In both of the errors, I find the following:
[000.000] Starting HTTP request task
[000.000] Waiting 59000 ms
[000.000] Issuing GET against https://localhost/ecp/
[000.000] Awaiting GET response
[000.000] Performing SSL validation
[000.000] Performing SSL validation
[000.000] Failed with exception: The underlying connection was closed: An unexpected error occurred on a receive.
[000.000] Starting HTTP request task
[000.000] Waiting 59000 ms
[000.000] Issuing GET against https://localhost/ecp/ReportingWebService/
[000.000] Awaiting GET response
[000.000] Performing SSL validation
[000.000] Performing SSL validation
[000.000] Failed with exception: The underlying connection was closed: An unexpected error occurred on a receive.
We require SSL on all connections. We use a third party certificate with multiple SANs. Since the probe is trying to use https://localhost, it fails because the name doesn't match.
I figure I have a few options: first, is there a way to change the URL that the probe uses to check. This seems to me to be the 'rightest' way I could fix this. Second could I alter the binding of the site so that the localhost hostname uses a dedicated,
self signed, trusted cert? Last, is there any way to simply disable the specific probes? We're a single server low volume setup and I'm not convinced that I need the probes anyway.
IS this a common issue? Outside of the warnings that SCOM throws at me, it is also causing a large volume of logs to be generated.
Justin Cervero - MS Enterprise Admin - Appalachian State UniversityHi,
I am afraid it’s hard coded. Just like the “Test-Outlookwebsiervices” command, it will also try the “localhost” and reports errors about certificate host name mismatch issue.
We can safely ignore this report.
Thanks,
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact
[email protected]
Simon Wu
TechNet Community Support -
Mailbox Transport health state unhelathy
Hi team,
I check my Exchange 2013 Mailbox server, abd the result, Mailbox Transport health set become unhealthy
and after i check into mailbox transport detail, there is some error. here the details
[PS] C:\Windows\system32>get-healthreport -server BCEJKT-MBX2-SVR | where {$_.alertvalue -ne "healthy"} | ft -auto
Server State HealthSet AlertValue LastTransitionTime MonitorCount
NotApplicable FIPS Unhealthy 8/6/2014 3:41:14 AM 22
NotApplicable Monitoring Unhealthy 8/6/2014 3:56:32 AM 9
NotApplicable MailboxTransport Unhealthy 8/6/2014 4:11:41 AM 56
NotApplicable MSExchangeCertificateDeployment Disabled 1/1/0001 7:00:00 AM 2
Server State Name TargetResource HealthSetName AlertValue ServerComp
onent
NotApplicable Mapi.Submit.Monitor MailboxTransport MailboxTransport Unhealthy None
NotApplicable MailboxDeliveryAvail MailboxTransport Unhealthy None
abilityMonitor
NotApplicable TransportDeliveryFai MailboxTransport Disabled None
luresDeliveryStoreDr
iver560Monitor
what error means?
and, why the state "NotApplicable" ?
Is there any services trouble (disturbed)?
Please give me details :)
Thanks
RegardsHi,
There is no official document explaining the state "NotApplicable". Search all related articles about HealthSet, the state is always NotApplicable.
From the output of running Get-HealthReport cmdlet, FIPS, Monitoring and MailboxTransport health set are unhealthy. Please use the Test-ServiceHealth cmdlet to check result.
Besides, please check the application log and system log for events related to this feature.
Best regards,
Belinda
Belinda Ma
TechNet Community Support -
Need help with RAID Card and degraded Raid-5 errors
Dear all,
I recently purchased a used Apple RAID card for my 2008 Mac Pro 8-Core. The installation went smooth, the card was immediately recognized and the battery reconditioned within one night.
So I started setting up a Raid Set with the 4 identical drives which I already used before as a software Raid. But each time the Raid Level-5 Volume is created, somewhat later the status turns red and the Raid is listed as "degraded"!
A closer look at log reveals:
+19:42:54 Drive carrier 00:01 inserted+
+19:42:27 Background task aborted: Task=Init,Scope=DRVGRP,Group=RS1+
+19:42:27 Degraded RAID set RS1 - No spare available for rebuild+
+19:42:26 Degraded RAID set RS1+
+19:42:22 Drive carrier 00:01 removed+
+15:10:57 Created volume “R1V1” on RAID set “RS1”+
So it seems that the drive from Bay 1 somehow gets lost (removed) a few hours after the volume is being created and anysoon later it's being "reinserted"...
Of course, the drive is NOT removed, nobody touched the Mac Pro! Also I did the same procedure 3 times and the result was always the same.
I also tried setting up JBOD and different RAID levels which do all work without a problem. Only when choosing RAID5 (what I intentionally bought the card for), the problem reappears
Anyone any solution or hint for me concerning this problem? Many thanks in advance!One drive completely broke down later. Replaced that drive and since the problem's gone!
Maybe you are looking for
-
Responsive projects display incorrect content due to orientation of tablet.
In a responsive project, how does Captivate determine which view should be displayed on which type of device? For example, does it use pixel width or device type? We have had difficulty designing for tablet devices because when the tablet is held in
-
Select records based on first n distinct values of column
I need to write a query in plsql to select records for first 3 distinct values of a single column (below example, ID )and all the rows for next 3 distinct values of the column and so on till the end of count of distinct values of a column. eg: ID nam
-
/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-
-
How to locate JSP Page in the server
Hi, I have the following requirement in OA Framework . There is a existing Customized OA page in the PO Module.My requirement is to add a text field in the existing page and based on a input to the text field ,i have to retrive the data in the page.
-
Can I delete an old vault?
Hi. I went from Aperture 1.5 to 2.x successfully. I don't see anything missing and I'm running smoothly. Before the upgrade I duped my 1.5 vault and library and tucked them away on FW for safekeeping. Now that I'm satisfied with the performance can I