[SOLVED] EXT4 Data Corruption Bug Linux 3.6.2 & 3.6.3

Be careful
EXT4 Data Corruption Bug Hits Stable Linux Kernels
http://www.phoronix.com/scan.php?page=n … px=MTIxNDQ
https://lkml.org/lkml/2012/10/23/779
Edited: Removed the [ALERT]  label.
Last edited by ontobelli (2012-11-01 04:58:43)

headkase wrote:
Even though it is a severe bug the chances of it happening to you are low.  You have to unmount and immediately remount an EXT4 partition twice in a row for it to happen.  On a normally operating system that is not a normal thing to happen.  Just wait on your desktop for 5 minutes before rebooting again.
Arch, as a general rule, tends to stick as close to upstream as possible.  I'm sure the devs are very competent people but a quick hack or branch revert has the possibility of introducing issues of its own.  With the chance of the bug occurring low on a normally operating system I think it is better to wait for a fix from upstream.
Well maybe i am a bit over stressed about this since with this computer i have quite a lot of troube which i cannot find solutions to. Also a kernel panic after a reboot this morning -probably not related to this- got me in a bad mood.
Anyway.

Similar Messages

  • [Solved] Occaisional data corruption on P35 NeoF IDE port.

    This is extremely anecdotal but a couple of times over the past few months my Seagate ST380011A has showed some sort of data corruption. I have tested the drive with the long test in Seatools and it passes ok. Could the Marvell controller be corrupting data. System is bootable of course but on occaision my Java cache was corrupted and today my windows media player database had to be rebuilt because of corruption. Have run chkdsk too, no errors. Have to say that the slave drive on the cable doesn't seem to be affected so haven't ruled out a drive problem as it's always my C drive.
    Is it just Windows being windows or do I have a deeper problem?
    Thanks
    Tim

    Already changed that. Will have to wait and see if it happens again. Thanks 
    Edit: Funny thing happened when I change the IDE cable I kept getting windows chkdsk on reboot, so I changed the cable again and it wouldn't go away so I had to format the slave drive (not the one with problems) to get the drive clean again (not dirty). It's a puzzle. Seagates tools say drives are ok. Weird.

  • M500/M5x0 QUEUED-TRIM data corruption alert (mostly for Linux users)

    The M550 (all released firmware), and the M500 (up to MU04 all released firmware) can cause data corruption when QUEUED TRIM is used. Since Crucial is not urging everyone to update to MU05 and are taking their time with the M550 update, I assume that Windows does not issue QUEUED TRIM by default, and therefore does not trigger the issue (yet). I have no idea about Intel RST enhanced windows drivers, MacOS or FreeBSD. Chances are they cannot trigger the bug, but you might want to check with the vendors.
     EDIT: The M500 MU05 firmware still has the QUEUED TRIM data-killer bug, there are no safe firmware versions.EDIT: This is a problem on several outdated versions of the Linux kernel, for the 3.12, 3.13, 3.14 and 3.17 branches. Linux releases older than 3.12 will NOT trigger the bug. Recent releases of the 3.12, 3.13, 3.14 and 3.17 branches have a blacklist in place and will NOT trigger the bug. The 3.15 and 3.16 kernels also have the blacklist, and won't trigger the firmware bug.
     Dangerous for use in kernels:3.12 (before 3.12.29);3.13 (before 3.13.7);
    3.14 (before 3.14.20);3.17 (before 3.17.1) - regression in the blacklist, fixed in 3.17.1. Safe kernels:anything before 3.12;3.12.29 and later;3.13.7 and later;3.14.20 and later;3.15 (all);3.16 (all);3.17.1 and later.Bug workaround for any kernel version (tanks performance down to a crawl on most workloads): disable NCQ in the kernel command line, by adding the libata.force=noncq parameter in the bootloader. The "uname -r" command will tell you the Linux kernel release you're running on.

    bogdan wrote:
    Thanks for clarification uname -r
    3.14.5-031405-generic After first reboot I have noticed following entries:[ 25.818233] ata1: log page 10h reported inactive tag 0
    [ 25.818242] ata1.00: exception Emask 0x1 SAct 0x50000000 SErr 0x0 action 0x0
    [ 25.818244] ata1.00: irq_stat 0x40000008
    [ 25.818247] ata1.00: failed command: READ FPDMA QUEUED
    [ 25.818252] ata1.00: cmd 60/60:e0:78:d4:15/00:00:09:00:00/40 tag 28 ncq 49152 in
    [ 25.818252] res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
    [ 25.818254] ata1.00: status: { DRDY }
    [ 25.818256] ata1.00: failed command: SEND FPDMA QUEUED
    [ 25.818260] ata1.00: cmd 64/01:f0:00:00:00/00:00:00:00:00/a0 tag 30 ncq 512 out
    [ 25.818260] res 40/00:f4:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
    [ 25.818262] ata1.00: status: { DRDY }
    [ 25.818490] ata1.00: supports DRM functions and may not be fully accessible
    [ 25.824747] ata1.00: supports DRM functions and may not be fully accessible
    [ 25.830741] ata1.00: configured for UDMA/133
    [ 25.830754] ata1.00: device reported invalid CHS sector 0
    [ 25.830779] sd 0:0:0:0: [sda]
    [ 25.830781] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [ 25.830783] sd 0:0:0:0: [sda]
    [ 25.830784] Sense Key : Aborted Command [current] [descriptor]
    [ 25.830787] Descriptor sense data with sense descriptors (in hex):
    [ 25.830788] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
    [ 25.830795] 00 00 00 00
    [ 25.830798] sd 0:0:0:0: [sda]
    [ 25.830800] Add. Sense: No additional sense information
    [ 25.830802] sd 0:0:0:0: [sda] CDB:
    [ 25.830804] Write same(16): 93 08 00 00 00 00 02 93 68 38 00 00 00 08 00 00
    [ 25.830812] end_request: I/O error, dev sda, sector 43214904
    [ 25.830827] ata1: EH complete
    [ 25.831278] EXT4-fs (sda1): discard request in group:164 block:27655 count:1 failed with -5Then I have created large files, 1GB, 15GB and 30GB, deleted them and issued fstrim commandsudo fstrim -v /
    /: 106956468224 bytes were trimmed Right now I have no more errors in log file except those recorded after first reboot so I guess I should wait for data corruption? It should show up much faster if you have the filesystem mounted with the "discard" option, which enables online discard mode.  My best guess is that corruption won't trigger on just every write, likely you want to have pending writes inside the SSD, and maybe cause a trim near them, or something else like that. Running the "mount" command as root should show you the mount options ("sudo mount" might do it in default Ubuntu).  "sudo mount -o discard,remount /" might enable it if it isn't already there.  Or try "sudo su -" to go root, and issue the commands without "sudo". Now, where the corruption, should it happen, will end up going, I don't know. EDIT:  Doing a lot of filesystem work might help, as well.  Maybe running bonie++ (warning: will do a lot of writes), or several concurrent file creation/remove workloads.  Doing it either in online discard mode, or running fstrim concurrently should do it, I guess.

  • How can I minimise data corruption?

    Mac OS X is great, and one of the greatest things it has achieved is an environment so stable that it almost never crashes.
    However, for me the next BIG problem with using computers is data corruption. Data corruption problems bug my life as much now as system freezes/crashes did 5 years ago.
    For some reason, it often seems to be preferences files that become corrupt. I don't know why, or whether other files are becoming corrupt too and I've not discovered it yet. Sometimes I wonder whether it's just because of all the junk I install, or the length of time it's been since doing a clean format. However, with my recent purchase of a Macbook, and within a couple of months having all my preference files becoming corrupt, this goes against those theories. My macbook has minimal amounts of software installed, and is generally kept quite simple.
    Obviously backing up is an important strategy, but that leads to a whole load of decisions like, how often to backup, do you keep incremental backups, do you restore absolutely everything when you discover 1 or 2 corrupt files (how do you know if others have become corrupt?).
    Correct shutting down is something I always do - unless something prevents me from doing so, like power cuts. I've also often had a problem with the sccreen remaining blank after macbook has slept or had the screensaver on. On occasion I've had to hold down the power button to shut it down and get it going again.
    I've looked into uninterruptible power supplies. Unfortunately, the ideal setup with additional battery to provide a few hours of power are very expensive. Also, should the macbook not be immune from problems caused by power fluctuations because of the battery? I certainly did get a system crash recently when there was a power cut - but at the time I just wondered if it was due to the wireless router going off.
    .mac and idisk seem to cause their share of problems. Again, I'm not sure if these are the cause or a consequence of the problems. I have iDisk syncing switched on, and on a few occasions it's vanished and I've had to reboot to get things back to normal. Recently there have been warnings of clashes between .mac and the local idisk.
    Filevault is another possible cause of problems. I've read people advising against it's use. However, if someone is willing to steal my macbook, I don't want that sort of person having access to anything, whether it's address book contacts, calendars, word documents or anything financial. OK, people suggest creating an encrypted disk image, but that doesn't solve the problem of preventing people accessing address book or ical.
    What I'd really like to know is, what are the main causes of data corruption. If I can identify the causes I might be better prepared for trying to prevent it. For example, if 99% of data corruption is due to power fluctuation then I might accept that I need to spend the money on a UPS.
    Once identifying the possible causes, what can be done to prevent them. Would a RAID 1 configuration protect against data corruption, or is it only good in cases of catastrophic drive failure? I've just purchased a 500GB external Firewire 800 drive, which raises the option of creating a RAID 1 with my 2 built in drives.
    Sorry for so many questions, but I just really need to get this sorted. Since moving from OS 9 to OS X this has regularly been my biggest cause of troubles.

    Hi, bilbo_baggins.
    You wrote: "What I'd really like to know is, what are the main causes of data corruption..."You've already identified them, but you seem reluctant to implement the procedures needed to mitigate or avoid those causes that can be mitigated or avoided, in particular:• Power outages or power problems.
    • Improper shutdowns.
    • Hardware problems or failures, e.g. hard drive failures, bad sectors, bad RAM, etc.
    • Bad programming.I can understand your position since:• Not everything one needs to run their computer comes in the box: additional money must be spent.
    • The solutions often seem more complex to implement than they really are. One needs some guidance, which again it does not come in the box, and few books address preventing problems before they occur.Here's my advice:
    1. Implementing a comprehensive Backup and Recovery Solution and using it regularly is essential to assure against data loss in the event of a hard drive failure or other problems. For advice on the backup and recovery solution I employ, see my "Backup and Recovery" FAQ. Using a personal computer without backup and recovery is like driving without auto insurance. Likewise, without a Backup and Recovery solution, you are accepting the risk of potentially losing all of your data at some point.
    2. Perform the little bit of required, regular maintenance: see my "Maintaining Mac OS X" FAQ. It covers my advice on "regular maintenance" and dispels some common "maintenance myths."
    3, If you use a desktop Mac, you need an Uninterruptible Power Supply: power outages and other power problems —surges, spikes, brownouts, etc. — can not only cause data corruption but damage your hardware. I have complete advice on selecting a UPS in the "Protecting Against Power Problems" chapter in my book. Don't simply walk into a store and by the first UPS recommended by a clerk: the UPS needs to be configured and sized to match your computer setup. You don't need hours of battery run time: 10-15 minutes is sufficient to save your work and perform a proper shutdown, or for a modern UPS to perform an automatic shutdown if your computer is running in your absence.
    4. If you regularly "solve" problems by performing a hard restart (pressing and holding the power button or, on Macs so equipped, the restart button), then go back to work without first troubleshooting the cause of the problem, you risk letting a small problem compound into a larger problem. At a minimum, after a hard restart your should:• Run the the Procedure specified in my "Resolving Disk, Permission, and Cache Corruption" FAQ.
    • Then troubleshoot the cause of the problem that led to the hard restart.My book also has an entire chapter on methods for troubleshooting "Freezes and Hangs."
    5. Likewise, hoping that by installing a Mac OS X Update will fix a problem, or simply reinstalling one, without first checking for other problems, can make a bad problem worse. Before installing software updates, you may wish to consider the advice in my "Installing Software Updates" FAQ. Taking the steps therein before installing an update often helps avert problems and gives you a fallback position in case trouble arises.
    6. FileVault does not corrupt data, but it, like any hard drive or disk imge, doesn't respond well to the causes cited above. This is why it is essential to regularly backup your encrypted Home folder using a comprehensive Backup and Recovery solution. FileVault is an "all your eggs in one basket" solution: if bad sectors develop on the hard drive in the area occupied by your encrypted Home folder, you could lose all the data therein without a backup.
    7. RAID: IMO, unless one is running a high-volume transaction server with a 99.999% ("Five Nines") availability requirement, RAID is overkill. For example, unless you're running a bank, a brokerage, or a major e-commerce site, you're probably spending sums of time and money with RAID that could be applied elsewhere.
    RAID is high on the "geek chic" scale, low on the "average user" practicality scale, and high on the "complex to troubleshoot" scale when problems arise. The average user is better served by implementing a comprehensive Backup and Recovery solution and using it regularly.
    8. I don't use .Mac — and hence, don't use an iDisk — so I can't advise you there. However, I suspect that if you're having problems with these, and the causes are unrelated to issues of Apple Server availability, then I'd suspect they are related to the other issues cited above.
    9. You can't completely avoid problems caused by bad programming, but you can minimize the risk by not installing every bit of shareware, freeware, or beta code you read about just to "try it out." Stick to reliable, proven applications — shareware or freeware offerings that are highly rated on sites like MacUpdate and VersionTracker — as well as commercial software from major software houses. LIkewise, a Backup and Recovery solution can help here.
    10. Personal computers today are not much more advanced than automobiles were in the 1920's and '30s: to drive back then, you had to be part mechanic as well as driver. Cars today still require regular maintenance. It's the same with personal computers today: you need to be prepared for troubleshooting them (mechanic) as well as using them (driver). Computer hardware can fail, just as autos can break down, usually at the worst possible moment.
    People whose homes or offices have several Macs, a network, and the other peripherals normally associated with such setups — printers, scanners, etc. — are running their own data centers but don't know it. Educating yourself is helpful: my "Learning About Mac OS X" FAQ has a number of resources that you will find helpful including books, online training, and more. My book focuses exclusively on troubleshooting, with a heavy emphasis on preventing problems before they occur and being prepared for them should they arise.
    Good luck!
    Dr. Smoke
    Author: Troubleshooting Mac® OS X
    Note: The information provided in the link(s) above is freely available. However, because I own The X Lab™, a commercial Web site to which some of these links point, the Apple Discussions Terms of Use require I include the following disclosure statement with this post:
    I may receive some form of compensation, financial or otherwise, from my recommendation or link.

  • Ibase Component and Partner Update - Data Corruption

    Hello Everyone,
    I'm facing a pecular issue with updating parters of an Ibase component through a custom program.
    This is the requirement:
    Under certain conditions, Partner A attached to the components of an Ibase have to be replaced by Partner B for the same partner function.
    If the ibase have more than 1 component and if both have the same partner, then a data corruption results. Teh data corruption is a double entry in table IBPART where i end up getting 2 valid entries for a PARTNERSET record.
    I'm using FM CRM_ICSS_CHANGE_COMPONENT to carry out the partner update.
    Here are the steps i'm using:
    1. I'm Looping at the Ibase
    2. I fill in the Component structure I_COMP with the Ibase and the Instance 1.
    3. I fill the partner structure I_PARTNER with the two partner records- Partner A (For Deletion by setting field UPDIND as 'D') and Partner B (For addition by setting Field UPDIND as 'I').
    4. Then the loop continues for updating the second Component.with the same details.
    After the Update, the following is happening at the table level.
    1. Table IBPART gets 2 records which are valid for each Instance. (Ideally, there should be only 1 record for each component which then links to multiple partner functions in tale CRMD_PARTNER). the two records are just slightly different in theoir VALID FROM timestamp. But both records are valid in Current Time.
    This is resulting in a short dump when i try to go to the partner section from IB52 transaction.
    I think the main reason for this is that table IBPART is not locaking down when the first update is happening (deletion) and hence i end up with two records
    Can any one help me out in this
    Regards
    Dharmendra

    Hi,
    we couldnt completely solve the issue. I could find no way to lock up the partner update to a component. But this is what we managed to do and so far we havent received any more data corruption error. We made a copy of the FM CRM_ICSS_CHANGE_COMPONENT and made it as an UPDATE FM instead of a Normal FM. This somehow mitigated the issue and till now we havent seen the problem re-surface. I'm not sure if this work for you.
    Thanks n Regards
    Dharmendra

  • Data corruption in Socket communication !!!!

    The application which I am developing is a typical client-server app using Java sockets with TCP/IP. I am using DataInputStream/DataOutputStream for reading/writing. Till the time the no. of streams is less than 25 it works fine and no data corruption is there but if we create more streams it starts happening...
    This data corruption takes place only for few messages not for every message.
    I am using JDK1.2.2 on linux.
    Can somebody give me a soln???

    Hi!
    I've seen this before on many occasions. As you've noticed, it only occurs on some messages and when the number of threads gets higher and higher. All it is, is a problem with the threading of the underlying java VM. Your solution, and I kidd you not, is to install the latest java VM as they have fixed threading issues and have properly synchronized the necessary blocks of code.
    Cheers,
    Mike
    ps Do I get my points? :-)

  • [CORRUPTION] data corruption after installation of SATA RAid-1

    I have a very weird problem since I installed 2 Seagate SATA harddisks in RAID-1 on the onboard RAID controller VT8237.
    After the installation my primary Western Digital HD (IDE) got "corrupted", could not start Windows XP anymore and I noticed a "terrible slow down" of datatransfer between harddisks.  I have been searching the web but could not found the same problem.
    Can someone please give any hints on how to solve the problem?  See signature for configuration.
    voltages (bios healtmonitor):
    +12V: 11,67V
    +5V: 5,06V
    BIOS Settings:
    VT8237 PATA-IDE controller: enabled
    VT8237 SATA-IDE controller: enabled
    V-Link Data 2X Support: enabled
    Boot sequence: only boot for IDE-1
    Important remarks:
    1. When I disable VT8237 SATA-IDE, making a backup with norton ghost from the corrupted HD is really much faster than when it's enabled.  I think that from the moment it's enabled, data corruption starts.
    2. Before enabling SAtA, everything was working fine.
    3. could it be that V-Link Data 2X Support is giving problems?
    4. Should I go for an external PCI sata card with raid support to solve all my problems?
    I really would appreciate if someone could give some hints on how to solve the problem.
    Thank you
    JohnQM

    Quote from: Sharp on 22-May-05, 00:35:53
    Hello,
    I am not sure what the V-Line Data x2 setting does.
    Try it out and see what happens.
    But the PSU is a problem And you should consider replacing it.
    After reading your post, I have :
    - purchased a new PSU (480W from Tagan which is rated as an excellent PSU)
    - disabled V-Line Data x2 en the corruption seems to have gone, but WHY? 
    What the V-Line Data exactly do?  Something with North & Southbridge?  Something with Southbrigde and AGP Video card?!?
    Thanks
    J.

  • Data corrupt block

    os Sun 5.10 oracle version 10.2.0.2 RAC 2 node
    alert.log 내용
    Hex dump of (file 206, block 393208) in trace file /oracle/app/oracle/admin/DBPGIC/udump/dbpgic1_ora_1424.trc
    Corrupt block relative dba: 0x3385fff8 (file 206, block 393208)
    Bad header found during backing up datafile
    Data in bad block:
    type: 32 format: 0 rdba: 0x00000001
    last change scn: 0x0000.98b00394 seq: 0x0 flg: 0x00
    spare1: 0x1 spare2: 0x27 spare3: 0x2
    consistency value in tail: 0x00000001
    check value in block header: 0x0
    block checksum disabled
    Reread of blocknum=393208, file=/dev/md/vg_rac06/rdsk/d119. found same corrupt data
    Reread of blocknum=393208, file=/dev/md/vg_rac06/rdsk/d119. found same corrupt data
    Reread of blocknum=393208, file=/dev/md/vg_rac06/rdsk/d119. found same corrupt data
    Reread of blocknum=393208, file=/dev/md/vg_rac06/rdsk/d119. found same corrupt data
    Reread of blocknum=393208, file=/dev/md/vg_rac06/rdsk/d119. found same corrupt data
    corrupt 발생한 Block id 를 검색해 보면 Block id 가 검색이 안됩니다.
    dba_extents 로 검색
    corrupt 때문에 Block id 가 검색이 안되는 것인지 궁금합니다.
    export 받으면 데이타는 정상적으로 export 가능.

    다행이네요. block corruption 이 발생한 곳이 데이터가 저장된 블록이
    아닌 것 같습니다. 그것도 rman백업을 통해서 발견한 것 같는데
    맞는지요?
    scn이 scn: 0x0000.00000000 가 아닌
    0x0000.98b00394 인 것으로 봐서는 physical corrupt 가 아닌
    soft corrupt인 것 같습니다.
    그렇다면 버그일 가능성이 높아서 찾아보니
    Bug 4411228 - Block corruption with mixture of file system and RAW files
    의 버그가 발견되었습니다. 이것이 아닐 수도 있지만..
    이러한 block corruption에 대한 처리방법 및 원인분석은
    오라클(주)를 통해서 정식으로 요청하셔야 합니다.
    metalink를 통해서 SR 요청을 하십시오.
    export는 high water mark 이후의 block corruption을 찾아내지 못하고 이외에도
    아래 몇가지 경우에서도 찾아내지 못합니다.
    db verify( dbv)의 경우에는 physical corruption은 찾아내지 못하고
    soft block corruption만 찾아낼 수 있습니다.
    경험상 physical corruption 이 발생하였으나 /dev/null로
    datafile copy가 안되는데도 dbv로는 이 문제를 찾아내지
    못하였습니다.
    그렇다면 가장 좋은 방법은 rman 입니다. rman은 high water mark까지의
    데이터를 백업해주면서 전체 데이터파일에 대한 체크를 하기도 합니다.
    physical corruption뿐만 아니라 logical corruption도 체크를
    하니 점검하기로는 rman이 가장 좋은 방법이라 생각합니다.
    The Export Utility
    # Use a full export to check database consistency
    # Export performs a full scan for all tables
    # Export only reads:
    - User data below the high-water mark
    - Parts of the data dictionary, while looking up information concerning the objects being exported
    # Export does not detect the following:
    - Disk corruptions above the high-water mark
    - Index corruptions
    - Free or temporary extent corruptions
    - Column data corruption (like invalid date values)
    block corruption을 정상적으로 복구하는 방법은 restore 후에
    복구하는 방법이 있겠으나 이미 restore할 백업이 block corruption이
    발생했을 수도 있습니다. 그러므로 다른 서버에 restore해보고
    정상적인 datafile인 것을 확인 후에 실환경에 restore하는 것이 좋습니다.
    만약 백업본까지 block corruption이 발생하였거나 또는 시간적 여유가
    없을 경우에는 table을 move tablespace 또는 index rebuild를 통해서
    다른 테이블스페이스로 데이터를 옮겨두고 문제가 발생한 테이블스페이스를
    drop해버리고 재생성 하는 것이 좋을 것 같습니다.(지금 현재 데이터의
    손실은 없으니 move tablespace, rebuild index 방법이 좋겠습니다.
    Handling Corruptions
    Check the alert file and system log file
    Use diagnostic tools to determine the type of corruption
    Dump blocks to find out what is wrong
    Determine whether the error persists by running checks multiple times
    Recover data from the corrupted object if necessary
    Preferred resolution method: media recovery
    Handling Corruptions
    Always try to find out if the error is permanent. Run the analyze command multiple times or, if possible, perform a shutdown and a startup and try again to perform the operation that failed earlier.
    Find out whether there are more corruptions. If you encounter one, there may be other corrupted blocks, as well. Use tools like DBVERIFY for this.
    Before you try to salvage the data, perform a block dump as evidence to identify the actual cause of the corruption.
    Make a hex dump of the bad block, using UNIX dd and od -x.
    Consider performing a redo log dump to check all the changes that were made to the block so that you can discover when the corruption occurred.
    Note: Remember that when you have a block corruption, performing media recovery is the recommended process after the hardware is verified.
    Resolve any hardware issues:
    - Memory boards
    - Disk controllers
    - Disks
    Recover or restore data from the corrupt object if necessary
    Handling Corruptions (continued)
    There is no point in continuing to work if there are hardware failures. When you encounter hardware problems, the vendor should be contacted and the machine should be checked and fixed before continuing. A full hardware diagnostics should be run.
    Many types of hardware failures are possible:
    Bad I/O hardware or firmware
    Operating system I/O or caching problem
    Memory or paging problems
    Disk repair utilities
    아래 관련 자료를 드립니다.
    All About Data Blocks Corruption in Oracle
    Vijaya R. Dumpa
    Data Block Overview:
    Oracle allocates logical database space for all data in a database. The units of database space allocation are data blocks (also called logical blocks, Oracle blocks, or pages), extents, and segments. The next level of logical database space is an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information. The level of logical database storage above an extent is called a segment. The high water mark is the boundary between used and unused space in a segment.
    The header contains general block information, such as the block address and the type of segment (for example, data, index, or rollback).
    Table Directory, this portion of the data block contains information about the table having rows in this block.
    Row Directory, this portion of the data block contains information about the actual rows in the block (including addresses for each row piece in the row data area).
    Free space is allocated for insertion of new rows and for updates to rows that require additional space.
    Row data, this portion of the data block contains rows in this block.
    Analyze the Table structure to identify block corruption:
    By analyzing the table structure and its associated objects, you can perform a detailed check of data blocks to identify block corruption:
    SQL> analyze table_name/index_name/cluster_name ... validate structure cascade;
    Detecting data block corruption using the DBVERIFY Utility:
    DBVERIFY is an external command-line utility that performs a physical data structure integrity check on an offline database. It can be used against backup files and online files. Integrity checks are significantly faster if you run against an offline database.
    Restrictions:
    DBVERIFY checks are limited to cache-managed blocks. It’s only for use with datafiles, it will not work against control files or redo logs.
    The following example is sample output of verification for the data file system_ts_01.dbf. And its Start block is 9 and end block is 25. Blocksize parameter is required only if the file to be verified has a non-2kb block size. Logfile parameter specifies the file to which logging information should be written. The feedback parameter has been given the value 2 to display one dot on the screen for every 2 blocks processed.
    $ dbv file=system_ts_01.dbf start=9 end=25 blocksize=16384 logfile=dbvsys_ts.log feedback=2
    DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
    (c) Copyright 2000 Oracle Corporation. All rights reserved.
    Output:
    $ pg dbvsys_ts.log
    DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
    (c) Copyright 2000 Oracle Corporation. All rights reserved.
    DBVERIFY - Verification starting : FILE = system_ts_01.dbf
    DBVERIFY - Verification complete
    Total Pages Examined : 17
    Total Pages Processed (Data) : 10
    Total Pages Failing (Data) : 0
    Total Pages Processed (Index) : 2
    Total Pages Failing (Index) : 0
    Total Pages Processed (Other) : 5
    Total Pages Empty : 0
    Total Pages Marked Corrupt : 0
    Total Pages Influx : 0
    Detecting and reporting data block corruption using the DBMS_REPAIR package:
    Note: Note that this event can only be used if the block "wrapper" is marked corrupt.
    Eg: If the block reports ORA-1578.
    1. Create DBMS_REPAIR administration tables:
    To Create Repair tables, run the below package.
    SQL> EXEC DBMS_REPAIR.ADMIN_TABLES(‘REPAIR_ADMIN’, 1,1, ‘REPAIR_TS’);
    Note that table names prefix with ‘REPAIR_’ or ‘ORPAN_’. If the second variable is 1, it will create ‘REAIR_key tables, if it is 2, then it will create ‘ORPAN_key tables.
    If the thread variable is
    1 then package performs ‘create’ operations.
    2 then package performs ‘delete’ operations.
    3 then package performs ‘drop’ operations.
    2. Scanning a specific table or Index using the DBMS_REPAIR.CHECK_OBJECT procedure:
    In the following example we check the table employee for possible corruption’s that belongs to the schema TEST. Let’s assume that we have created our administration tables called REPAIR_ADMIN in schema SYS.
    To check the table block corruption use the following procedure:
    SQL> VARIABLE A NUMBER;
    SQL> EXEC DBMS_REPAIR.CHECK_OBJECT (‘TEST’,’EMP’, NULL,
    1,’REPAIR_ADMIN’, NULL, NULL, NULL, NULL,:A);
    SQL> PRINT A;
    To check which block is corrupted, check in the REPAIR_ADMIN table.
    SQL> SELECT * FROM REPAIR_ADMIN;
    3. Fixing corrupt block using the DBMS_REPAIR.FIX_CORRUPT_BLOCK procedure:
    SQL> VARIABLE A NUMBER;
    SQL> EXEC DBMS_REPAIR.FIX.CORRUPT_BLOCKS (‘TEST’,’EMP’, NULL,
    1,’REPARI_ADMIN’, NULL,:A);
    SQL> SELECT MARKED FROM REPAIR_ADMIN;
    If u select the EMP table now you still get the error ORA-1578.
    4. Skipping corrupt blocks using the DBMS_REPAIR. SKIP_CORRUPT_BLOCK procedure:
    SQL> EXEC DBMS_REPAIR. SKIP_CORRUPT.BLOCKS (‘TEST’, ‘EMP’, 1,1);
    Notice the verification of running the DBMS_REPAIR tool. You have lost some of data. One main advantage of this tool is that you can retrieve the data past the corrupted block. However we have lost some data in the table.
    5. This procedure is useful in identifying orphan keys in indexes that are pointing to corrupt rows of the table:
    SQL> EXEC DBMS_REPAIR. DUMP ORPHAN_KEYS (‘TEST’,’IDX_EMP’, NULL,
    2, ‘REPAIR_ADMIN’, ‘ORPHAN_ADMIN’, NULL,:A);
    If u see any records in ORPHAN_ADMIN table you have to drop and re-create the index to avoid any inconsistencies in your queries.
    6. The last thing you need to do while using the DBMS_REPAIR package is to run the DBMS_REPAIR.REBUILD_FREELISTS procedure to reinitialize the free list details in the data dictionary views.
    SQL> EXEC DBMS_REPAIR.REBUILD_FREELISTS (‘TEST’,’EMP’, NULL, 1);
    NOTE
    Setting events 10210, 10211, 10212, and 10225 can be done by adding the following line for each event in the init.ora file:
    Event = "event_number trace name errorstack forever, level 10"
    When event 10210 is set, the data blocks are checked for corruption by checking their integrity. Data blocks that don't match the format are marked as soft corrupt.
    When event 10211 is set, the index blocks are checked for corruption by checking their integrity. Index blocks that don't match the format are marked as soft corrupt.
    When event 10212 is set, the cluster blocks are checked for corruption by checking their integrity. Cluster blocks that don't match the format are marked as soft corrupt.
    When event 10225 is set, the fet$ and uset$ dictionary tables are checked for corruption by checking their integrity. Blocks that don't match the format are marked as soft corrupt.
    Set event 10231 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing full table scans:
    Event="10231 trace name context forever, level 10"
    Set event 10233 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing index range scans:
    Event="10233 trace name context forever, level 10"
    To dump the Oracle block you can use below command from 8.x on words:
    SQL> ALTER SYSTEM DUMP DATAFILE 11 block 9;
    This command dumps datablock 9 in datafile11, into USER_DUMP_DEST directory.
    Dumping Redo Logs file blocks:
    SQL> ALTER SYSTEM DUMP LOGFILE ‘/usr/oracle8/product/admin/udump/rl. log’;
    Rollback segments block corruption, it will cause problems (ORA-1578) while starting up the database.
    With support of oracle, can use below under source parameter to startup the database.
    CORRUPTEDROLLBACK_SEGMENTS=(RBS_1, RBS_2)
    DB_BLOCK_COMPUTE_CHECKSUM
    This parameter is normally used to debug corruption’s that happen on disk.
    The following V$ views contain information about blocks marked logically corrupt:
    V$ BACKUP_CORRUPTION, V$COPY_CORRUPTION
    When this parameter is set, while reading a block from disk to catch, oracle will compute the checksum again and compares it with the value that is in the block.
    If they differ, it indicates that the block is corrupted on disk. Oracle makes the block as corrupt and signals an error. There is an overhead involved in setting this parameter.
    DB_BLOCK_CACHE_PROTECT=‘TRUE’
    Oracle will catch stray writes made by processes in the buffer catch.
    Oracle 9i new RMAN futures:
    Obtain the datafile numbers and block numbers for the corrupted blocks. Typically, you obtain this output from the standard output, the alert.log, trace files, or a media management interface. For example, you may see the following in a trace file:
    ORA-01578: ORACLE data block corrupted (file # 9, block # 13)
    ORA-01110: data file 9: '/oracle/dbs/tbs_91.f'
    ORA-01578: ORACLE data block corrupted (file # 2, block # 19)
    ORA-01110: data file 2: '/oracle/dbs/tbs_21.f'
    $rman target =rman/rman@rmanprod
    RMAN> run {
    2> allocate channel ch1 type disk;
    3> blockrecover datafile 9 block 13 datafile 2 block 19;
    4> }
    Recovering Data blocks Using Selected Backups:
    # restore from backupset
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM BACKUPSET;
    # restore from datafile image copy
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM DATAFILECOPY;
    # restore from backupset with tag "mondayAM"
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 199 FROM TAG = mondayAM;
    # restore using backups made before one week ago
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
    UNTIL 'SYSDATE-7';
    # restore using backups made before SCN 100
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE UNTIL SCN 100;
    # restore using backups made before log sequence 7024
    BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
    UNTIL SEQUENCE 7024;
    글 수정:
    Min Angel (Yeon Hong Min, Korean)

  • K8N Neo2 ans Sata data corruption problems

    I am searching how to solve my problems for a ong time now :
    I have data corruption problem on my PC : for example when I rar a big file (like 600Mo) I have CRC errors on my sata drive but not my ide
    when I do a Divx I have image corruptions espcially in movements
    My config :
    Motherboard : MSI Neo2 K8N platinium
    Processor : AMD64 3500+
    Memory : Corsair 2x512Mb 3.3.3.8 tested all night with memtest or gold memory : ok
    HD : IBM Sata 160Gb tested with hitachi test tool : OK
    HD: IBM Ide 80Gb : ok
    Graphics : Asus ATI 9600XT
    No overclocking !
    The shop from which I buy my components told me that it was a partitioning error ????????
    Well after is "help" I still have the problem !
    But how can it be true as I have done my partitions more than once !
    Once under XP install, once with PartitionMagic
    And if i create a partition copy a file and rar it there I have a CRC error
    What can i do now I am sick of reinstalling again and again XP !
    Can it be the sata cable ? from MSI by the way ... and the hitachi test tool told me that the HD is OK
    I have done installs without installing Nividia 5.10 drivers it doesn't help
    Do I have to install only 5.03 drivers from MSI ?
    Any idea ?
    So has you see I am desesperate and I will propably by another motherboard as I can't guess where the problem is !
    Thanks for your help
    Kesasar

    Quote from: Supershanks on 20-February-05, 15:49:00
    If memory voltage is left on auto it defaults to 2.5v which may be underpowering your ram & causing the CRC issues.
    That's quite not true though this "legend" is posted a lot 
    If left on "auto" memory voltage is set via reading the SPID-info of the ram. You can easily check this with a multimeter. On my sys the voltage is set correctly to 2.7V with my corsair memory.

  • Data corruption on MSI K7N2 Delta ILSR

    Hi all,
    Just got my new MSI K7N2 Delta ILSR up and running.
    Got some issues with the memory running at 200mhz while the Barton core of my Athlon Xp 2600+ was running 166mhz, this had to be 1:1 (thx forum  )
    While running the default bios settings with cpu/mem at 166/166 fsb everything is stable as hell. I have been running Prime95 for 2 days now and using the system for dvd ripping/encoding, UT2004 gaming etc.
    Then the overclocking began   I'm using the standard AMD cooler that came with my boxed Xp2600+ so I didn't expect an FSB of 200mhz.
    I stopped at an FSB of 190mhz...been crunching Prime95 for 18 hours now and again very stable.
    Now comes the problem : Sometimes I get crc errors when installing a game or decrypting a dvd!! But how can this be? Prime95 is stable at this 190fsb overclock.
    When using an FSB of 166mhz no data corruption or crc errors.
    What can this be? It's not the cables because they're all the same as in my old setup which was running (ahum...walking) stable. PCI/AGP speeds are locked when overclocking...
    Only thing I can think of is the power supply. I've read in a warning post from Bas that the Q-Tec 550w psu I have is crap. So my guess is PSU, but maybe it's something else I'm forgetting...Any suggestions??
    The stupid thing is when I'm not overclocking I don't have the data corruption problems, but I'm still using the same crappy Q-Tec 550w PSU.
    Help!  
    gr,
    Remco

    Hmmm, that's curious. On the MSI K7N2 Delta ILSR product page Kingston memory is recommended  
    Ok, did a Memtest86 v1.11 pass at cpu/dimm 166/166 - no problems whatsoever. I think using Prime95 would have revealed memory problems...but will do another pass with cpu/dimm at 190/190.
    Can overclocking affect the Promise Raid Controller?? It should be locked like the PCI/AGP but who knows...
    Well, I still have to replace my PSU but I'm not 100% sure that this is the problem. What PSU would you recommend? ...and I'm also thinking about replacing the standard CPU cooler, any advice is welcome  
    gr,
    Remco

  • Data corruption with SSD after hard reset

    Hi, I'm using a Mac Mini for a car product and I MUST shut it down each time the hard way, i.e. by cutting the power.
    I am perfectly aware that this is definitely NOT the recommended way to shut down OSX because it might lead to data corruption but at the moment there are NO other options. So please don't simply suggest to shut it down the "good way", i.e. via software, because it simply isn't an option.
    Now, in the past I did lots of ON/OFF tests with conventional drives and had no problems. Recently I moved to SSD drives and it looks like I get more frequent boot problems (apple logo with the wheel spinning forever). Using DiskWarrior in these cases fixes the problem and repairs the folder structure of the drive. After that the drive boots again.
    Given the constraints for the application, i.e. shutdown==powercut, is there any way I can ensure better data integrity? If I disable write caching would that help? Any other trick I could do to make the system more resilient? And finally, are actually SSD more prone to crashes of this kind or was that just a coincidence?
    Thanks a lot for the help, I hope some of you has experienced this problem in a similar situation (and found a good solution)
    cheers
    Emanuele

    There are OSX compatible UPSs that send status information to the computer via USB. This enables the computer to do a normal shutdown before the UPS battery runs out. Here's an example of the system preference that will show when a compatible UPS is connected:
    <http://www.cyberpowersystems.com/support/faqs/faqOSXUPS.html>
    Do you think disabling write caching would help at all?
    It might, but the problem may be entirely in the SSD. Do you have journaling enabled on the SSD? That could make a big difference.
    Finally, I'm using OCZ Solid 2. Do you have any comment on that or do you recommend something different? I doubt that SLC drives would help in this matter.
    You would have to contact the SSD makers and ask them how well they handle power failures. It should be possible to make them no worse than a normal hard drive, but there may be some compromises they have to make for performance reasons.

  • Time and date reset bug

    hu guys, recently ive been experiencing a time and date reset bug, sometimes when i open up my computer it says something like your date is set to (i forgot the exact date but the year is 2001) so i change it back to the correct time and date, im not sure how i got the bug, can anyone help me fix this problem, its beginning to be annoying. thanks in advance!!

    Sherwin,
    Those symptoms indicate that you need to get your backup (PRAM) battery replaced.If your computer does not retain parameter RAM (PRAM) settings when it is turned off, this generally indicates that the battery needs to be changed.;~)

  • Firefox 10.0.2 causes "image data corrupted" warning message during update try on OSX 10.7.3?

    I tried to update firefox 10.0.0 to 10.0.2 on Mac Pro OSX 10.7.3 which failed with warning "Image data corrupted". Any ideas what to do?

    If there are problems with updating or with the permissions then easiest is to download the full version and trash the currently installed version to do a clean install of the new version.
    Download a new copy of the Firefox program and save the disk image (dmg) file to the desktop
    *Firefox 10.0.x: http://www.mozilla.org/en-US/firefox/all.html
    *Trash the current Firefox application to do a clean (re-)install
    *Install the new version that you have downloaded
    Your profile data is stored elsewhere in the Firefox Profile Folder, so you won't lose your bookmarks and other personal data if you uninstall and (re)install Firefox.
    *http://kb.mozillazine.org/Profile_folder_-_Firefox

  • HT5167 image data corrupted message when installing anything. Please help!!

    A couple of days ago something happened and now every time I tried to install an app i receive the following mesage: (app being installed file name).dmg image data corrupt. Can anyone point me in the right direction please???

    All of the posters need to take their system to an Apple Genuis bar and have whatever is wrong with them fixed.
    It is either some software you have installed. The hard drive has become corrupted in some way or you have a hardware problem.
    Or all of the above.
    Wish all of you good luck.

  • New Apple Macbook Pro...data corruption

    Apple is warning users of its latest Macbook Pro about an issue with the laptop that may cause data corruption.The tech giant released a firmware update on Wednesday to fix a problem on the laptop's flash storage component.The 15-inch mid 2015 range Retina model is affected, with corruption occurring in "rare cases", the company said.News of the warning comes as thousands of Macbook owners are campaigning for action over stains appearing on the laptop's retina screen.http://www.bbc.com/news/technology-33652239(Incidentally...this device has a starting price of £1599......) 

    When I was a lot younger I vowed I would NEVER say 'in my day etc etc etc'....so I wont....Instead I will say when my Mum and Dad bought their house.....it cost them £350......

Maybe you are looking for

  • Error Message; Wheel won't spin

    When I try to open itunes on my computer, I get a message: "unknown error occurred (-50). I can't find any reference to this under support and have no idea what to do. To make matters worse, my spin wheel won't spin!!

  • Instead of text cursor to edit, I keep getting a "delete" box.

    Ever since the last update, and whenever I try to edit a text, I touch the screen at an approximate point in the text from where I want to start positioning the cursor from, a "delete" box appears instead of the blue cursor triangle. This is not what

  • IChat fonts driving me CRAZY!!!

    It's probably not such a huge deal but i just can't understand why it's happening. When I try to change the font in iChat it won't change! I select a different font in the list, the window shows the font, the text in the window changes to the font se

  • Calculate with decimals

    How to do calculations in Java where I have to ignore decimals for example: a = 101.46456 b = 10000 c = 10000 * 101.464 I want to use only 3 decimals and ignore if there are more than three. Thanks

  • Why won't podcasts sync to my iPod classic from my new MacBook?

    I recently purchased a new MacBook Pro after my old PowerBook died. I was able to restore my music to the new Mac.  However, even though I've checked the appropiate selection to sync my podcasts, it doesn't work.  The old podcasts that were on the iP