Ambiguous RAID failure

Early 2008 Mac Pro with Apple RAID card and 4 x 1TB drives installed.
I've had yet another RAID failure on my system (it's happened several times before) but this time the diaganostic is ambiguous and I need to make absolutely sure what's happened before I try to recover from it.
Overnight the RAID Utility log reported that Drive 3 had failed, that Raid Set RS1 was now degraded, and that there was no spare available for rebuild.
When I look now in RAID Utility at the status of the drives and of the array, all four drives show "green" (SMART: Verified and Status: Good), and Raid Set RS1 is "Viable (Degraded)". But it shows the drives in Bays 2, 3, 4 as "Assigned" to Raid Set RS1, while the drive in Bay is not: it shows as "Roaming".
I'm fairly sure that one of the drives actually is problematic, because I've been having increasingly frequent episodes of freezing and non-responsiveness on the system (spinning beachball). In the past couple of days it got so bad that it was difficult to do anything at all following a restart; the freeze/beachball happened very soon after. I remember now that I had exactly this symptom in the past, just prior to a drive failure that RAID Utility reported.
So I guess I need to replace one of the drives, mark it as "Spare" in RAID Utility, and let the array rebuild.
But WHICH drive should I replace? The log says that Drive 3 failed (I'm assuming that "Drive 3" is the drive in "Bay 3"), but now that drive shows as "good"--as do all four drives. It's Drive 1 (i.e. the drive in Bay 1) that's been taken out of the array; drives 2, 3, and 4 are in Raid Set RS1. Is that a red herring? Is it possible that Drive 1 is bad even though the report was about Drive 3? (Drive 1 is the only drive that has never been replaced at any time in the four years since I got this system.)
I think/fear that if I replace Drive 3, I'll blow away the array.
So it seems to me that I should rebuild the array by marking Drive 1 as spare (since it's the only drive that's unassigned), wait for it to complete, and then replace Drive 3 and rebuild again. Or maybe I should just replace Drive 1 pre-emptively.
I don't know, but it takes a full 72 hours for the re-build to complete, a nerve-wracking time because throughout it the system is vulnerable to a second drive failure, so I would prefer not to have to do it multiple times.
Can someone please tell me in detail what the safest/most correct way is to proceed in order to recover from this?
Thanks.

Thanks, but it doesn't seem to be related to swapping/paging or to CPU usage by a runaway process. The system always seems to have plenty of RAM (18 GB is installed) and neither Activity Monitor nor iStat Menus ever seem to show anything obviously suspicious (nor does "top" when I run that), No swapping, no processes consuming all the CPU (also, it's an 8-core machine).
What typically happens is that the system will seem fine, but then running a new command, or (e.g.) asking my IMAP client to move some files, provokes a hang: everything locks up, spinning beachball, etc. Sometimes this is preceded by an obvious degradation in performance, but sometimes not.
It "feels" disk-related and in the past I had similar symptoms preceding a disk failure (as I did before replacing the drive a few days ago). But I can't see anything relevant in the log files, and in the past, when a disk failed, I didn't see clear evidence of this in the logs until RAID Utility declared the failure--nothing previous to that.
Is there some way to increase the verbosity of the disk-related logging? If a disk really is problematic, even if the problems are recoverable after retries, you'd think that would be detectable by the system and that it could be logged.
The only other things I can think of trying right now are either: (1) replacing the remaining drives in the RAID Array one by one, starting with the drive in Bay 3 (because of the earlier ambiguity), and/or (2) disconnecting as much as possible from the system and seeing if it becomes more stable in that configuration. Though I've already mostly done (2) without any obvious improvement,

Similar Messages

Windows 7 has no notification of RAID failure?

This is a repost of this thread: http://social.technet.microsoft.com/Forums/en-US/w7itprogeneral/thread/a77c6445-e504-4394-a945-7a6c096e0871
Per Technet, being a subscriber guarantees me a response from a Microsoft Engineer within two business days, but the previous thread was not answered satisfactorily, and now Microsoft seems to be ignoring it. Since my previous thread went unresolved
and this is of critical importance, I chose to repost this and get a real answer from Microsoft.
Having Software RAID as a built-in option seems like a great idea. Hard disks are notoriously unreliable, and they carry the most important part of our technological world: our data. Being able to easily build a software RAID from two cheap 1TB drives gives
me so much piece of mind ... or does it?
I've noticed my software RAID seems to be periodically losing sync.* This is worrisome (maybe a drive is failing?) but not nearly as worrisome as HOW I noticed it. For some reason, after cruising along smoothly for several months with my RAID 1, I decided
to check on its status in Computer Management -> Disk Management. And, oh nos! there was my software RAID with a Failed Redundancy Error. I had to "Reactivate Disk" and it started resynching. I decided to check again every so often and everything was fine
again for a while until I lost synch again.
Again, my losing synch is not the point of this post. My point is why was there NO NOTIFICATION of my synchronization loss? I have NO IDEA how long my drives had lost synchronization before I stumbled upon that fact the first time. What is the point of software
RAID if Windows does not inform you that it has failed? I could lose one drive, go cruising along for months thinking I'm fine, then lose the other drive and POW! I'm left with nothing due to a false sense of security (please no lectures on RAID being for
reducing downtime and not for use as a backup solution - I KNOW).
I just had another loss of synchronization, and I checked around to see if I had been notified anywhere else (the Action Center seems like it could be a good candidate for this). Finding nothing, I've come here to post my concerns.
But in order to maintain my required level of uptime, I need to KNOW when a drive has failed redundancy so I can correct the situation (by replacing the drive or other). Where is the notification that I am running on only one drive?
Update: In the previous thread, it was suggested I use the Windows Task Scheduler to make my own custom notification based on RAID failure events in the Event Viewer. After investigating this unnecessarily obtuse (but usable) solution, it was found
that Windows 7 does not generate any events for RAID synchronization failures. This is astounding. What I really want is graphical, e-mail, and event notification of a drive failure. I will take just an event. I find it amazing that
the new "Action Center" thinks it is important enough to tell me that my Flash player has a problem, but can't let me know that a drive is failing. This seems like the perfect job for an OS notification system: to warn you when the feces is about to
impact the ventilator.
*Update: the disk in question ended up dying completely and was replaced under warranty with no problems and the machine is now happily cruising along with 2 working and synchronized hard drives once again. I would have never noticed I had a failed
hard drive if I didn't check on my own.

This is another Kludge that works on Windows 7
Microsoft Sysinternals has the BGInfo tool, which you can add vbscripts to generate additional data that gets included in the display.
For example:
Fields
- Custom - New - 'MyRAID'
* VB Script file
Path: C:\raidchk.vbs
The contents of raidchk.vbs are below and belong to three websites, put together.
Site A. provided the idea for a vbscript to parse the diskpart.exe info
Site B. provided a better script that produced wscript messages
Site C. explained the wscript object was not available in VB script and to use Createobject instead
Once the script is generating output, adding it to the BGInfo configuration display is as easy as adding the line:
MyRAID: <MyRAID>
The output of the script gets substituted into the display field.
An improvement would be to WinTask schedule it to periodically update it. The length of the message output will dynamically "shift" the display front an center when a major event happens. The Vbscript could also include a. email, b. eventlogger,
c. whistles and bells audio.. ect..
The reason the information is so hard to get at appears to be because VDS is a COM object based service, which brokers a user tool connection to vendor raid COM interface. There is no native wmi provider that will subscribe to the Microsoft RAID notifications
and provide status information, let alone bind to a notification service like email or eventx. diskpart or dmadmin are the limits provided.
But it almost looks like something Powershell could handle.
VDS Under PowerShell - a hint that it could be done
Some references that the information is readily available:
Volume Object- Mirror is a type of volume to VDS
VDS_VOLUME_PROP structure - a VDS volume has a status structure
status - A
VDS_VOLUME_STATUS enumeration value that specifies the status of the volume.
health - A
VDS_HEALTH enumeration value that specifies the health state of the volume.
TransitionState - A
VDS_TRANSITION_STATE enumeration value that specifies the transition state of the volume.
VDS_VOLUME_NOTIFICATION structure - a VDS volume
An application can receive volume events by implementing the
IVdsAdviseSink interface and passing the interface pointer as an argument to the
IVdsService::Advise method.
To get the volume object, use the
IVdsService::GetObject method. You can then use the
IVdsVolume::GetProperties method or the
IVdsVolume2::GetProperties2 method to get the volume properties.
VDS Notifications
I say that without looking around very much. The recommended language for implementing a provider are C/C++ ect.. although I've never heard of a VB script wmi provider.. I think there would be data type language problems. Thus we are left with polling
as with most scripting, or some sort of event trigger like a eventx alert that will run a script as needed. The Advanced XMLconfig for Windows Task Scheduler in Win 7/2008 is quite advanced and has a fine degree of control for matching events if there are
events being logged by the VDS service with actual source data.
VDS is on the way out, the COM interface is deprecated and will not support storage spaces, likewise the API and tools for Storage Spaces will not support VDS. This is a "take a leap of faith" moment.
I tend to prefer to wait a year or two before trusting my data to the effort.
As for Software vs Intel ICH vs dedicated HBA ?
I'm interested at a low level for personal use, something easy and reliable beyond Tape backup or any system that requires me to perform an action, or allocate huge amounts of reduced performance time to feed the Tapes, feed the Backup service.. ect.
I have tried every variation possible and come to the conclusion that at one time Hardware raid made sense on an enterprise level. But never the consumer. The RAID built into the motherboard tends to stay with the motherboard if that dies. Ditto with the
HBA and its impossible trying to source a compatible HBA a season or two after the latest greatest hardware release. A long term available HBA is going to cost lots of money, because they have no other way to make back the money they could have made by releasing
more new product, or to pay for the space to store the inventory. Ebay is not a great hardware vendor HBA sourcing plan. Tape backup and any major network backup system are too slow.
At the same time, BIOS/UEFI being what it is, trusting a bootable volume to RAID between versions from the same manufacturer or differerent manufacturers is fool hardy. There is no incentive for cross vendor bootable RAID support. So plan to burn the bootable
operating system drive, its expendable. Deliberately move the profile and data regions to either junction points or off to the mirror array. When a drive dies, replace it, when a motherboard dies replace it and reinstall the operating system. When all else
fails, import the foreign "soft" raid mirror into another operating system.. and with 2003/XP being over 13 years old and still kicking.. that trumps any hardware vendor for longevity. If W7 lasts that long.. or longer.. it will be a good run. W8/2012
are in their infancy and have a lot of testing to go through in the real world.
And for offsite replication, static archiving. There is always eSATA and an external drive that could be added to the mirror and subsequently broken. Being natively driven by the operating system I would suspect it would be reliable.. but testing is always
the best way to shore up belief.
' Lists all logical drives on the local computer which are configured for
' software RAID. Returns an %ERRORLEVEL% of 1 if any redundant drive is
' not in a "Healthy" state. Returns 0 otherwise.
' Supports Windows Vista/7, Windows 2008/R2
Option Explicit
Dim WshShell, oExec
Dim RegexParse
Dim hasError : hasError = 0
Set WshShell = CreateObject("WScript.Shell")
Set RegexParse = New RegExp
' Execute diskpart
Set oExec = WshShell.Exec("%comspec% /c echo list volume | diskpart.exe")
RegexParse.Pattern = "\s\s(Volume\s\d)\s+([A-Z])\s+(.*)\s\s(NTFS|FAT)\s+(Mirror|RAID-5)\s+(\d+)\s+(..)\s\s([A-Za-z]*\s?[A-Za-z]*)(\s\s)*.*"
While Not oExec.StdOut.AtEndOfStream
Dim regexMatches
Dim Volume, Drive, Description, Redundancy, RaidStatus
Dim CurrentLine : CurrentLine = oExec.StdOut.ReadLine
Set regexMatches = RegexParse.Execute(CurrentLine)
If (regexMatches.Count > 0) Then
Dim match
Set match = regexMatches(0)
If match.SubMatches.Count >= 8 Then
Volume = match.SubMatches(0)
Drive = match.SubMatches(1)
Description = Trim(match.SubMatches(2))
Redundancy = match.SubMatches(4)
RaidStatus = Trim(match.SubMatches(7))
End If
If RaidStatus <> "Healthy" Then
hasError = 1
Echo "**WARNING** "
End If
Echo "Status of " & Redundancy & " " & Drive & ": (" & Description & ") is """ & RaidStatus & """"
End If
Wend
If (hasError) Then
Echo ""
Echo "WARNING: One or more redundant drives are not in a ""Healthy"" state!"
End If
'WScript.Quit(hasError)
-- John Willis

Raid failure on WAVE 7371

Hi,
I am getting Raid failure error on the WAVE 7371. All the disks on the WAVE are online.
Please advice how to fix this issue.
RAID Logical drive information:
raid-disk:    RAID-5 Impacted
                Enabled   (read-cache) Enabled (write-back)
Critical Alarms:
        Alarm ID                 Module/Submodule               Instance
   1 raid_failure              sysmon                       raid

Hi Ameen,
Please see the troubleshooting wiki here:
http://docwiki.cisco.com/wiki/Cisco_WAAS_Troubleshooting_Guide_for_Release_4.1.3_and_Later_--_Troubleshooting_Disk_and_Hardware_Problems
See if you have the most up to date raid firmware.
Regards,
Mike

Raid failure when connecting ipod

I have an Asus Striker Extreme motheroboard, and all the lates drivers. Whenever I connect my ipod 8gb nano, I get a Raid Failure, and have to either rebuild my Raid 0+1 or Delete the array and reinstall my system. This is getting to be a pain. Is there a fix for this with a new driver or setting? Nothing else seems to cause the problem on this system, which I've used for 7 months without a problem before.

hi Holly!
hmmm. okay, are you getting an "Ipod Service Error" when you click on the Ipod updater, too?
if so, try toonz's technique, here:
toonz, "iTunes 5.01: Rending iPods useless Worldwide" #2, 12:01pm Oct 4, 2005 CDT
keep us posted.
love, b

MSI MPOWER MAX Z87 - RAID failure after BIOS update (settings reset)

I use the integrated intel RAID controller to create a 2 disk RAID-0 6TB (2x3TB), but everytime I update the BIOS, there is a RAID failure.
It does not matter if I use Live Update or the Update BIOS+ME option of the BIOS menu. All settings are lost, and after the boot the RAID controller is deactivated due to that. I go to BIOS right after the auto-reboot and re-enter my settings (RAID instead of AHCI), but the intel controller only sees 1 disk as a RAID member. Recovery option is not active when I enter the controller options (control+I). I always need to delete and recreate the RAID, which of course loses all data.
Very frustrating!
Am I doing the update wrong? I know I did it once and settings were not reset, my RAID was kept intact, but I believe it was only 1 time of 4 or 5 updates I did to my MPOWER MAX.

Hi, thanks for trying to help!
- I use Windows 8.1.
- My vga does not support UEFI so I use... legacy mode? (I'm not that familiar with BIOS->UEFI yet, sorry.)
- According to Intel RST, the RAID HDDs are on ports 2 and 3.
- This time the update was from 1.80 to 1.90.
1.70 to 1.80 also resulted in the same problem, but I used LiveUpdate.
For previous updates, I don't remember but I sometimes I used LiveUpdate, sometimes the BIOS feature.
I'm pretty sure that only once the RAID was kept intact, and I don't remember the versions or method I used, but I believe I used the BIOS method (don't know if needed BIOS+ME or not), not LiveUpdate.

RAID Failure Reported

My Arch Linux system is reporting a failure when I login via Gnome. The system has the RAID partitions:
md0 = swap
md1 = /boot
md2 = /
When I login via Gnome, the system alerts me with a serious looking warning that one of two disks has multiple bad sectors on it. I fired up a terminal window and can't see what the issue is:
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
974365187 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda2[0] sdb2[1]
393472 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
1999025 blocks super 1.2 [2/2] [UU]
From the info above, everything reports fine and shows both disks on each partition to be UP/UP.
Here is the graphical disk utility reporting an error:
http://img813.imageshack.us/i/raidq.png/

Look at the SMART output, or use smartmontools.
The disk probably has just a few bad sectors which it internally relocated, so the RAID arrays are still fine. But it may be an indication that the drive is slowly dying, anyway.
I'm in the same situation and ordered a new disk straight away.

V440 hardware raid failure ( boot disk)

we use raidctl -c c1t0d0 c1t1d0 RAID Volume 'c0t0d0' created ,by if c1t0d0 disk failure , how to replace HDD

Perhaps follow the guidance that is in the V440's manuals?
http://docs.sun.com/app/docs/coll/v440

Raid Failure Log?

Hi there, I have OS X Tiger Server setup with a Raid Mirror using Disk Utility. Last week due to some network changes I hit the restart button on the "headless" machine because I was admittedly too lazy to find a monitor and a keyboard to go in and do it the right way.
However it couldn't start up again, and I heard the ominous "click" of death coming from one of the drives.
Sure enough the Master in the Master/Slave ATA drive arrangement that were setup as a Mirror RAID was dead, and in fact preventing the boot drive (on a separate IDE channel) from starting up :S After disconnecting power to just that drive things started up somewhat well with a now "degraded" RAID based off of just the other drive :P
My question here is: When did the drive actually die? Restarting might have demonstrated it was dead with the clicking and the preventing of the boot drive to start. But I feel it is entirely possible that the drive died some time ago and OS X was transparently working off of just one drive. But how do I find this out? Is there a log file that records this information?
Maybe I'm just not willing to admit that my force-restart may have caused this hardware failure, but even so, for those cases where things just happen, where is the log to find out?
Thanks for your help!

I am having a problem similar to the one here:http://community.spiceworks.com/topic/244382-cannot-access-file-server-by-name-but-can-by-ip?page=1When accessing a server by name, I receive"\\servername is not accessible. You may not have permission to use this network resource. Contact the administrator of this server to find out if you have access permissions.The target account name is incorrect"The thread I linked to above isold and I didn't want to try to tag onto the end of it.I have a group of servers all on a local subnet. There are two DCs and 12 member servers. The main data server is one of the DCs. After some reboots, over half of the servers can no longer access the data server by name \\SHSMASTER, but can access it fine by IP. Ping resolves the name properly.I've ruled out anything that's specific to one station, like DNS...

Constant Raid failure NB/SB to hot?

Good evening gents
I have 2x OCZ vertex 30gb raid-0 in my system and my raid array fails every 1-2 months.. im getting a little bit tired of recovering backups so im trying to find the source of the problem.
I have a MSI Eclipse SLI motherboard with the buggy NB which sometimes (not often) goes up to 95 degrees (most of the time its around 85 degrees). I already started a topic on the OCZ forums aswell and they say it could be the NB/SB that messes up my raid all the time.
I have 2 Samsung Spinpoints running in raid-0 wihtout any problems. In my opinion it couldn't be the NB/SB cause once every 1-2 months doesnt seem alot for the NB/SB to be the problem, i think it would occur alot more times if it really was the NB/SB. Putting new cooling on the NB/SB would be my extreme solution cause i have my CPU water cooled and it would take me sometime to take the mobo out again and trying to fix the problem.
I have my i7 920 running @ 4Ghz 24/2 at max load 60 degrees and is Linx stable (Ran it for multiple hours)
Do you guys think that the NB/SB is the problem or do you have any suggestions?
Ps. i have no problems with my PC without those SSD's.
Thanks in advance :D

Quote
Do you guys think that the NB/SB is the problem or do you have any suggestions?
If it is the IOH, it would be the first actual reported case of trouble. As long as it doesn't go over 95C Tcontrol, data loss or corruption shouldn't take place. Some have run a considerable amount over that & haven't reported any failures or problems. As for the ICH, that could be a problem, but temp wise, they seem to run 'very' cool natively. If you want to add a fan for cooling the IOH, an Antec Spot Cool works well, & it can be mounted probably off of the top right mount point & goosenecked over towards the IOH.
The details of your system are a little sketchy, like OS, PSU, bios version, AHCI set prior to OS install, etc. I kept losing HDDs randomly on an Eclipse Plus & it turned out to be a faulty PSU +5V rail causing it.

Raid failure ESXi 6.0

Had this issue too. You should log into the iLO and check the logs there. The disk is in a state of pre-failure. Meaning SMART has detected imminent failure. Chances are with all the plugging in and out you have done you have corrupted some data.
Mines did the same thing as yours and then Windows started a CHECKDISK and recovered a bunch of errors. Then Windows booted fine but some files were damaged. Had to restore from backup.

Hello,I have an ML350 G5 with a P400. I know the P400 is not officially supported, but it still works. Today I was upgrading from ESXi 5.5 to ESXi 6.0 (fresh install). Once I rebooted after the setup one of the disks failed.I let the server boot and then the LED on the server showed me that the disk has failed. I removed it and plugged it back in to see if it will work again. It did. However, I decided to reboot just to make sure. During the shutdown process another HDD showed as failed.I pressed F2 to renable my raid. All the data seems to be still there. However, after booting the hard drive showed as failed again (the one in Bay 3, which failed at first).I tried to boot again with the old ESXi 5.5 USB stick and it looks like it is fine except that it says read for rebuild.Any idea what could have caused this?Could the driver have...
This topic first appeared in the Spiceworks Community

Need urgent help with RAID failure on Graphics server (running OSX10)

Hey.
Here's the deal... Our graphics department has a MAC G4 (fully upgraded. newest patches, OS, etc.) that's got a 4 port SATA RAID controller. They use this with four 750GB hard drives as an external RAID (the OS is on a seperate, 65GB hard drive). The way it's set up, through Apple's OS, is we have the four 750GBs in pairs... those pairs are striped, and then the two sets of striped 750s are mirrored. This has worked just fine for a number of months, but we recently had a power surge, and now we're unable to mount "Monkey" (the name of the 1.4TB array).
See the below screenshots for more clarification.
http://www.amcdoors.com/lorenzo/raidsetup.JPG
This shows the RAID card, and the four SATA cables coming out of it.
http://www.amcdoors.com/lorenzo/leftside1.JPG
As you can see in this and the below screenshots, this shows the 1.4TB Striped set as 'online', but the two below (the two 750GB hard drives) show up as "offline".
http://www.amcdoors.com/lorenzo/leftside2.JPG
this shows the actual drive i click on. same thing
http://www.amcdoors.com/lorenzo/leftside3a.JPG
If i click the entire array, you can see how it's set up (the Mirrored RAID set consisting of the two Striped RAID sets). The RAID sets (all 3 of them) are showing up as online, yet the drives are offline. No idea why. The drives are all powered, and are spinning, and plugged in.
http://www.amcdoors.com/lorenzo/leftside3b.JPG
when you click "Monkey Drive", nothing shows up in it. i don't remember if anything ever used to. if you click 'mount', nothing happens.
http://www.amcdoors.com/lorenzo/leftside4.JPG
http://www.amcdoors.com/lorenzo/leftside5.JPG
this shows the exact same problem as the first RAID set.
http://www.amcdoors.com/lorenzo/verifyRAIDset1.JPG
If you try to Verify the RAID set, it shows that no repairs are necessary
http://www.amcdoors.com/lorenzo/verifymonkey.JPG
If you try to verify Monkey, you get a "Volume Needs Repair" error... but when you try to repair it, you get this:
http://www.amcdoors.com/lorenzo/repairmonkey.JPG
Really need help; this is a pretty critical problem. Someone that's familar with RAID solutions on MACs please help me out. We can't risk losing any data if at all possible.
Thanks
~Lorenzo

You might want to post this to the Server Products forums.

Software RAID Failure - my experience and solution

I just wanted to share this information with the iCloud community.
I searched a bit and did not find much information that was useful with regard to my software RAID issue.
I have 27 inch Mid 2011 iMac with SSD and Hard drive which has been great.
I added an external hard drive (I think if I mention any brand name the moderator will delete this post) which includes an nice aluminum case with two 3 TB hard drives within it, and it has a big blue light on the front and is connected via Thunderbolt. This unit is about 2 years old and I have it configured in a 3 TB mirrored RAID (RAID 1) via a software RAID configured via Mac OS Disk Utility.
I had at one point a minor glitch which was fixed using another piece of software (again if I mention a brand the moderator will delete this post) which is like a 'Harddrive Fighter' or similar type name LOL.   So otherwise that RAID has served me well as a site for my Time Machine back up and Aperture Vault, etc. (I created a 1.5 TB Sparse bundle for Time Machine so that the backup would not use the entire 3 TBs)
I recently purchased a second aluminum block of drives, and set that up as a 4 TB RAID 1.
Each of the two RAIDs are set with the option of “Automatically rebuild RAID mirror sets” checked.
I put only about 400 gb on the new RAID to let it sit for a ‘burning in period.’
A few days ago the monitoring software from the vendor who sells the aluminum block of drives told me I had a problem. One of the drives had “Failed.”   The monitoring software strangely enough does not distinguish the drives so you can figure out which pair had the issue, so I assumed it was the New 8 TB model. Long story short, it was the older 6 TB model, but that does not matter for this discussion.
I contacted the vender and this is part of their response.
“This is an indication that the Disk Utility application in Mac had a momentary problem communicating with the drive mechanism. As a result, it marked that drive as "failed" in the header information. Unfortunately, once this designation is applied to a drive by the OS, the Disk Utility will thereafter refuse to attempt any further operations with that disk until the incorrect "failed" marker is manually cleared off the drive.”
That did not sound very good to me…..back up killed by a SOFTWARE GLITCH?
“The solution is to remove the corrupted volume header, and allow the generation of a new one….This command will need to be done for each disk in the array… (using Terminal)…
diskutil zerodisk (identifier)
…3. After everything is finished, you should be able to exit Terminal, and go back into the Disk Utility Application to re-configure the RAID array on the device.”
Furthermore they said.
“If the Disk Utility has placed a flag into the RAID array header (which exists on both drives) then performing this procedure on a single drive will not correct anything.”
And…
“When a drive actually does fail, it typically stops appearing in the Disk Utility application altogether. In that circumstance, it will never be marked "failed" by the Disk Utility, so the header erase operation is not needed.”
This all sounded like a bad idea to me. And what does the Vendors RAID monitor software say then? “Disk Really Really FAILED, check for a fire.”
As I tried to figure out which drive was actually the bad RAID pair I stumbled on a solution.
First I noted that the OS Disk Utility did NOT show a fault in the RAID. It listed both RAIDS as “Online.’ Thus no rebuilding was needed and it did not begin the rebuild process.
The Vendors disk monitor software saw some fault, but Mac was still able to read and write to the RAID, both disks in the mirror. I wrote a folder to the RAID and with various rebooting steps I pulled the “Bad” drive and looked at the “Good” Drive….the folder was there…I put the Bad drive back in and pulled the Good Drive and the folder was there on the “bad” drive. So it wrote to both drives. AND THE VENDORS MONITORING SOFTWARE SHOWED THE PREVIOUSLY LABELED ‘BAD’ DRIVE AS ‘GOOD’ AND THE MISSING DRIVE SLOT AS ‘BAD’.
My stumbled FIX.   I moved a bunch of files off the failed RAID to the new RAID but before I moved the sparse bundle, a folder of 500 gigs movies and some other really big folders the DISK UTILITY WINDOW (which I still had open) now showed that the RAID had a Defect and began rebuilding the mirror set itself, out of the blue!   I don't know why this happened. But moving about 1/2 of the data off of it perhaps did something? Any Ideas?
This process took a few hours as best I can tell (let it run overnight) and the next day the RAID was fine and the Vendors RAID monitor did not show a fault any longer.
So, the Vendors RAID monitoring software reporting a “FAILED” drive without any specific error codes to look up. Perhaps they could have more info for the user on the specific fault? The support line of the the Vendor said with certainty “the Volume Header is corrupted” and THE ONLY FIX is to completely ZERO THE DRIVE! This was not necessary as it turns out.
And the stick in the eye to me…..
“I've also sometimes seen the drives get marked as "failed" by the disk utility due to a shaky connection. In some cases, swapping the ends of the Thunderbolt cable will help with this. Something to try, perhaps, if your problems come back. “
Ya Right…..
Mike

Follow up.
After going through the Zeroing process and rebuilding the RAID set three times, with various configurations, LaCie finally agreed to repair the unit under warrantee.
I tried swapping the power supplies and thunderbolt wires, tried taking the drive out of series with the newer big brother of it. And it still failed after a few days.
I just wanted to share more of what I learned with regard to rebuilding the RAID sets via the Terminal. The commands can be typed partially and a help paragraph will come up to give VERY cryptic descriptions of the proper use of the commands.
First Under terminal you can used the command "diskutil appleRAID list" to list those drives which are in the RAID. This gives you the ID number for each physical drive. For example:
AppleRAID sets (1 found)
===============================================================================
Name:                 LaCie RAID 3TB
Unique ID:            84A93ADF-A7CA-4E5A-B8AE-8B4A8A6960CA
Type:                 Mirror
Status:               Online
Size:                 3.0 TB (3000248991744 Bytes)
Rebuild:              manual
Device Node:          disk4
# DevNode   UUID                                  Status     Size
0 disk3s2   D53F6A81-89F1-4FB3-86A9-8808006683C2 Online     3000248991744
- disk2s2   E58CA8F5-1D2C-423A-B4BE-FBAA80F85879 Spare      3000248991744
===============================================================================
In my situation with the failed RAID, I had an extra disk in this with the status of Missing/Failed.
The command is "diskutil appleRAID remove" and the cryptic help paragraph says:
Usage: diskutil appleRAID remove MemberDeviceName|MemberUUID
        RAIDSetVolumePath|RAIDSetDeviceName|RAIDSetUUID
MemberDeviceName|MemberUUID is the number listed in the "diskutil appleRAID List" command, and
RAIDSetVolumePath|RAIDSetDeviceName|RAIDSetUUID is the Device Node for the RAID which here is /dev/disk4.
I used this command to remove the third entry (missing/failed), I did not copy the terminal window text on that one, so I cannot show the list of three disks.
I could not get to remove the disk2s2 disk listed as SPARE, as it gave an error message:
Michaels-iMac:~ mike_aronis$ diskutil appleraid remove E58CA8F5-1D2C-423A-B4BE-FBAA80F85879 /dev/disk4
Started RAID operation on disk4 LaCie RAID 3TB
Removing disk from RAID
Changing the disk type
Can't resize the file system on the disk "disk2s2"
Error: -69827: The partition cannot be resized
But I was able to remove it using the graphical interface Disk Utility program using the delete key.
I then rebuilt the RAID set by dragging the second drive back into the RAID set.
I could not get the command: "diskutil appleRAID update AutoRebuild 1 /dev/disk4" to work, because even though it was trying to execute it HUNG. I put the two drives into my newer LaCie 2big as my attempt at further trouble shooting the RAID (this was not suggested by LaCie tech), rebuild the RAID and now I am going to leave it setup that way for a few days before I ship it back to just see if the old drives work fine in the new RAID box (thus proving the RAID box is the problem). I tried the AutoRebuild 1 command just now and it gave an error.
Michaels-iMac:~ mike_aronis$ diskutil appleraid update autorebuild 1 /dev/disk4
Error updating RAID: Couldn't modify RAID (-69848)
Michaels-iMac:~ mike_aronis$
In my haste to rebuild the RAID set for the third or forth time as LaCie led me through the testing this and test that phase, I forgot to click the "Auto Rebuild" option in the Disk Utility program.
Question for the more experienced:
As I was working on this issue, I notice that each time I rebooted and did work in the Terminal (with and without the RAID plugged in to the thunderbolt connection) I notice that the list of drives would change and my main boot drive would not stay listed as drive 0! Some times it would be drive 0, sometimes the RAID would be listed as Drive 0. It's strange to me...I would have thought the designation for Drive0 and Drive1 would always be my two build in drives (SSD and spinning drive).
Mike

Mac Mini Server HDD raid failure help

My server has been running for years, perfectly, most of the time. It is set up as a mirrored raid set.
I noted today that the second HDD is displaying a "failed" red flag in disk utility. I attempted to Rebuild and that failed.
In the FIRsT aid tab Verify returns errors. "This disk needs to be repaired using the Recovery HD..restart control r....etc."
There's my problem. No monitor or keyboard. I'm using screen sharing from an imac.
Is there a way to run the server reboot remotely?
Is it time to purchase a new server or simply replace the faulty drive?
Cheers
Gaz

Hi Chaz,
Is it time to purchase a new server or simply replace the faulty drive?
I think replace the faulty drive is thw answer.

Apple-Installed RAID Failure

When I bought my G5, I ordered it with the dual 250 GB SATA drives in a raid to create a 500 GB drive. Today, it appears that the RAID has failed and the computer will not boot properly. Well, rather it starts up, is fine for about 5 minutes, and then everything shuts down.
Using disk utility while booting from the (apparently) damaged drive does not reveal any errors because the drives seem to work fine until the crash. However, I installed 10.4.6 on an external drive, and while the RAID is recognized for about a minute after startup, it just quits and I get the device removal warning. After the RAID quits, running disk utility reveals that one of the disks has "Failed" and the other is "Offline."
I have tried resetting the PRAM and starting up in Safe Mode, but neither succeeded in solving the problem. Is it possible to correct the error without losing data? Is there a way of at least stabilizing it long enough so that I can retrieve important data (because like an idiot, I didn't backup)?

Well I went into an Apple Store, and the only advice that they were able to give me was to try Prosoft's Data Rescue II, although none of the Geniuses had ever used it so they really didn't know if it would work.
As it turns out, if the controller is stable enough, Data Rescue can actually recover data from it. So I got myself a 500 GB external hard drive and ran drive clone (which simply copies the data from one drive to another) and the program didn't show any errors! There is the slight problem that the 500 GB drive that I got was about 1.4 gigs short of the internals, but I'll be pretty darn happy if I can keep 99.6% of my data!
Unfortunately the clone drive isn't mounting, but I assume that's either because the drive was not formatted, or because of that missing 1.4 GB. However, I do have a large internal drive that I can restore to from the clone drive, so I'm hoping everything will work out.

WAE 674 RAID Failure

I replaced the disk in Slot 3, which was defective. After replacement, the disk showed "Online" in show disk detail within 15 mins of replacement. This seemed odd, it usually requires more than 1 hour to rebuild a disk.   However, the RAID-5 status showed "Critical".
When I execute a "disk logical shutdown", reload, then attempt to execute "disk recreate-raid" I get this returned:
Failed to remove RAID Logical drive. (1,2)
Controllers found: 1
A selected physical drive is not available for use.
Command aborted.
Failed to create RAID Logical drive. (1,2)
Please, verify the disk does not contain mounted file system that remain in use.
Execute the command "(config)#disk logical shutdown" and reload the device before the array is recreated.
====================================================================================================
show disks tech-support>
Physical Device information
      Device #0
         Device is a Hard drive
         State                         : Ready
         Supported                     : Yes
         Transfer Speed                : SAS 3.0 Gb/s
         Reported Channel,Device       : 0,0
         Vendor                        : IBM-ESXS
         Model                         : ST3300656SS
         Firmware                      : BA49
         Serial number                 : 3QP2B8BF
         World-wide name               : 5000C50016999C98
         Size                          : 286102 MB
         Write Cache                   : Disabled (write-through)
         FRU                           : 43X0805
         PFA                           : No
      Device #1
         Device is a Hard drive
         State                         : Ready
         Supported                     : Yes
         Transfer Speed                : SAS 3.0 Gb/s
         Reported Channel,Device       : 0,2
         Vendor                        : IBM-ESXS
         Model                         : ST3300656SS
         Firmware                      : BA49
         Serial number                 : 3QP2B712
         World-wide name               : 5000C50016990110
         Size                          : 286102 MB
         Write Cache                   : Disabled (write-through)
         FRU                           : 43X0805
         PFA                           : No
      Device #2
         Device is a Hard drive
         State                         : Ready
         Supported                     : Yes
         Transfer Speed                : SAS 3.0 Gb/s
         Reported Channel,Device       : 0,3
         Vendor                        : IBM-ESXS
         Model                         : VPBPA300C3EST1 N
         Firmware                      : A529
         Serial number                 : JLVVUBVC
         World-wide name               : 5000CCA00930C3BB
         Size                          : 286102 MB
         Write Cache                   : Disabled (write-through)
         FRU                           : 43X0805
         PFA                           : No Physical Device information

Hi James,
The problem, I believe is here:
RAID Physical disk information:
disk00: Ready                      3QP2B8BF    286102 MB
disk02: Ready                      JLVVUBVC    286102 MB
disk03: Ready                      3QP2B712    286102 MB
If you check the output, it says disk01 is missing. One thing we should try is - remove disk03 and put it in place of disk01.
Further, the disks should be in "online" state, not "Ready" state. This may be hardware problem with RAID controller.
Anyway, here is the last option we can apply:
Use the system recovery process, if you can. The steps you may want to perform are step 7: Rebuild RAID. And then Step 8: Install .bin image.
Let us know how it goes.
Thanks.

Ambiguous RAID failure

Similar Messages

Maybe you are looking for