RAC node upgrade issue

We have our company's database on Oracle Real application clusters database consisting of two RAC nodes. We would like to perform some hardware upgrades on both the RAC nodes. Could anyone please tell if it is OK to shutdown one instance at a time and remove all the network/interconnect cables from it and at the same time the other RAC node keeps working. After one node is upgraded and all the network/interconnect cables are connected back to it, will everything be just OK like before or are there are certain things to be cautious about ?
Thanks in advance

I think you got to do a clean shutdown so that it does not require any instance recovery when you re-open the database
SQL>SHUTDOWN TRANSACTIONAL
would allow all the current transactions to complete then shutdown the db

Similar Messages

  • Issues while Add / Delete RAC Node in Oracle 10g R2

    Hi,
    I have an requirement to add a New Node in the existing 2 Node RAC at Production, where 1 Node is Active & other one is passive due to licence issue & cannot keep both the nodes as active. Due to performance issues (Memory , CPU Cores ..etc) we are adding another new node.
    Right now we are planning to add a 3rd database node making the new node as active and current active one as passive which is a swap & later on after final observation delete and decommission the current passive node.
    This activity is checked at the Dev database with the same infrastructure (OS + Memory ..etc) but want to check what is the best approach (or) challenges we face during the RAC Node Addition / Deletion
    RAC DB Version : 10.2.o.4
    OS Version : RHEL 5.8
    (1) Is the approach is right one , First Adding the node & later on delete
    (2) If the approach is the correct , what would be the behavious of the 3rd node in means of active (or) passive
    (3) We have taken RMAN backup , OS backup , CRS , ORACLE_HOME , ASM_Home backup , OCR & VD.
    (4) Could you please give detail steps for adding / deleting node in 10g R2.
    (5) Are they any known bugs to us with the DB release (or) OS while performing this activity.
    Since this is a production machine we want to more proactive . Please correct or add any thing i am missing out ...
    With Thanks,
    Rakesh

    Hello Rakesh,
    Please follow the following steps.
    Node Addition Steps
    1. Install and configure OS and hardware for new node.
    2. Add Oracle Clusterware to the new node.
    3. Configure ONS for the new node.
    4. Add ASM home to the new node.
    5. Add Databse home to the new node.
    6. Add a listener to the new node.
    7. Add ASM instance to the New Node.
    8. Add a database instance to the new node.
    Details of steps
    1. run cluvfy to verify whether New node is ready for addition or not.
         $ cluvfy stage -pre crsinst -n node2
    2. from node1, execute
              $/u01/app/crs11g/oui/bin/addNode.sh
    3. Specify node2 vip address and follow instructions.
    4. In the last of installtion it may through an wornig and will ask to click on YES. click on YES
    5. from node1,
              /u01/app/crs11g/bin/racgons add_config node2:6200
    6. from Node1,set ORACLE_HOME=ASM_HOME and then execute addNode.sh from $ASM_HOME/oui/bin and Follow instrusctions.
    7. From node1, set ORACLE_HOME=DB_HOME and then
         /u01/app/oracle/product/11.1.0/db_1/oui/bin/addNode.sh
         and Follow instructions.
    8. from node2 start NETCA and configure listener for new node. While configuring Listener select the name of new node.
    9. from node1 start dbca from ASM Home to configure ASM instance for new node.
    10. Again from node1 start dbca from DB Home to add DB instance
    Node deletion Steps
    1. Delete the Database instance on the node to be deleted.
    2. Clean up the ASM instance.
    3. Remove the listener from the node to be deleted.
    4. Remove the node from the database.
    5. Remove the node from ASM.
    6. Remove ONS configuration from the node to be deleted.
    7. Remove the node from the clusterware
    Details of Steps
    1. Remove database Instance of node2
         Dbca -> instance Management -> delete instance -> password for sys -> select node -> finish.
    2. Stop asm for node2 from any nodes.
         $srvctl stop asm –n node2
    3. Remove asm for node2
         $ srvctl remove asm -n node2
    4. Remove Listener from Node2 using NETCA.
    5. From Node2:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    6. From Node2, start runinstaller from Oracle_DB_Home/oui/bin, and remove "DB_HOME"
         $ ./runinstaller
         On the WELCOME Screen -> Deinstall product -> Select dbhome name (OraDb10g_Home1) -> Remove
    7. From Node1:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    8. From Node2, set Oracle_Home to asm_1 and then fire:
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
    9. From Node2, start OUI and deinstall ASM Home.
    10. From Node1, Set ORACLE_HOME= /u01/app/oracle/product/11.1.0/asm_1
    11. From Node1: from /u01/app/oracle/product/11.1.0/asm_1/oui/bin, start OUI
              ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
    12. From Node2: as a root user (#) execute rootdelete.sh from /u01/app/crs11g/install
         # /u01/app/crs11g/install/rootdelete.sh
    13.From Node-1 first find out the node numbers
         # /u01/app/crs11g/bin/olsnodes -n
         output : node1 1
              node2 2
    14. From Node-1 as a root user (#):
         # /u01/app/crs11g/install/rootdeletenode.sh node2[Node_Name] 2[node_no]
         output:
              CRS nodeapps are deleted successfully
              clscfg: EXISTING configuration version 4 detected.
              clscfg: version 4 is 11 Release 1.
              Node deletion operation successful.
              'node2' deleted successfully
    15. From Node2 set ORACLE_HOME=CRS_HOME and then execute
         $$ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" CRS=TRUE -local
    16. ./runInstaller and remove CRS_HOME
    17. From Node-1:
         $ /u01/app/crs11g/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1" CRS=TRUE
    18. check node is deleted from ./crs_stat -t

  • EBS 12.1.3 Upgrade issue

    Hi,
    I installed 12.1.1 with the Vision Database on my machines which has a 64 bit Linux 5 update 5 operating system and everything went fine. After that I decided to upgrade to 12.1.3. I installed the following patches in order they are given below :
    1) R12.AD.B.DELTA.3 Patch 9239089
    2) Patch 9239090
    3) Patch 9239095
    4) 9817770:R12.ATG_PF.B POST-R12.ATG_PF.B.DELTA.3CONSOLIDATED PATCH
    5) 9966055:R12.FND.B [TRANSLATED VERSION OF FNDSCSGN NOT LAUNCHED
    The only issues I had was during the compilation of one form library I got an error in Patch 923090 and I said to continue. The patches completed successfully (I think) since I can log into applications. My issue is I have a lot of invalid objects. I tried to manually compile them but failed then followed one of the other threads to run utlrp.sql and utlirp.sql scripts as per (R12.1.1 - Invalid Objects After Patching [ID 1093163.1])
    I have followed that document and tried to run but my invalid count stays the same. Here is the snapshot when I query the database :
    SQL> SELECT COUNT(*), OWNER FROM DBA_OBJECTS WHERE STATUS='INVALID'
    2 GROUP BY OWNER;
    COUNT(*) OWNER
    1 RE
    2 CA
    4 PUBLIC
    1 HERMAN
    228 APPS
    2 FLOWS_010500
    SQL> @$ORACLE_HOME/rdbms/admin/utlrp.sql
    TIMESTAMP
    COMP_TIMESTAMP UTLRP_BGN 2012-04-05 07:23:05
    DOC> The following PL/SQL block invokes UTL_RECOMP to recompile invalid
    DOC> objects in the database. Recompilation time is proportional to the
    DOC> number of invalid objects in the database, so this command may take
    DOC> a long time to execute on a database with a large number of invalid
    DOC> objects.
    DOC>
    DOC> Use the following queries to track recompilation progress:
    DOC>
    DOC> 1. Query returning the number of invalid objects remaining. This
    DOC> number should decrease with time.
    DOC> SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
    DOC>
    DOC> 2. Query returning the number of objects compiled so far. This number
    DOC> should increase with time.
    DOC> SELECT COUNT(*) FROM UTL_RECOMP_COMPILED;
    DOC>
    DOC> This script automatically chooses serial or parallel recompilation
    DOC> based on the number of CPUs available (parameter cpu_count) multiplied
    DOC> by the number of threads per CPU (parameter parallel_threads_per_cpu).
    DOC> On RAC, this number is added across all RAC nodes.
    DOC>
    DOC> UTL_RECOMP uses DBMS_SCHEDULER to create jobs for parallel
    DOC> recompilation. Jobs are created without instance affinity so that they
    DOC> can migrate across RAC nodes. Use the following queries to verify
    DOC> whether UTL_RECOMP jobs are being created and run correctly:
    DOC>
    DOC> 1. Query showing jobs created by UTL_RECOMP
    DOC> SELECT job_name FROM dba_scheduler_jobs
    DOC> WHERE job_name like 'UTL_RECOMP_SLAVE_%';
    DOC>
    DOC> 2. Query showing UTL_RECOMP jobs that are running
    DOC> SELECT job_name FROM dba_scheduler_running_jobs
    DOC> WHERE job_name like 'UTL_RECOMP_SLAVE_%';
    DOC>#
    PL/SQL procedure successfully completed.
    TIMESTAMP
    COMP_TIMESTAMP UTLRP_END 2012-04-05 07:24:29
    DECLARE
    ERROR at line 1:
    ORA-00904: "FALSE": invalid identifier
    ORA-06512: at line 13
    DOC> The following query reports the number of objects that have compiled
    DOC> with errors (objects that compile with errors have status set to 3 in
    DOC> obj$). If the number is higher than expected, please examine the error
    DOC> messages reported with each object (using SHOW ERRORS) to see if they
    DOC> point to system misconfiguration or resource constraints that must be
    DOC> fixed before attempting to recompile these objects.
    DOC>#
    OBJECTS WITH ERRORS
    4
    DOC> The following query reports the number of errors caught during
    DOC> recompilation. If this number is non-zero, please query the error
    DOC> messages in the table UTL_RECOMP_ERRORS to see if any of these errors
    DOC> are due to misconfiguration or resource constraints that must be
    DOC> fixed before objects can compile successfully.
    DOC>#
    ERRORS DURING RECOMPILATION
    4
    PL/SQL procedure successfully completed.
    Invoking Ultra Search Install/Upgrade validation procedure VALIDATE_WK
    Ultra Search VALIDATE_WK done with no error
    PL/SQL procedure successfully completed.
    However for some reason the job utlrp.sql only shows 4 invalids as shown when I run it. But the number changes once the script is finished and I am not sure why. If I query SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6) now after utlrp.sql has finished running I get the below
    SQL> SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
    COUNT(*)
    234
    But if I query the same query while utlrp.sql is running I get a number as low as 10 I have seen but once the script finishes it goes back up. I don't know what to do. I have run autoconfig on both the db and apps tier. Is there anything anyone can suggest me ? I am totally lost on this.
    Thanks..

    Hi I did set the disablefast_validate=TRUE and re-ran only utlrp.sql to re-compile now I have a lot more packages compiled but I still have some invalids. I just want to know can I ignore these and move on or are they important packages / objects in APPS which may impact functionality ?
    below is the results of the invalid objects.
    Thanks
    OWNER,OBJECT_NAME,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS,TEMPORARY,GENERATED,SECONDARY,NAMESPACE,EDITION_NAME
    PUBLIC,WWV_FLOW_LIST_OF_VALUES_DATA,,1006661,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
    PUBLIC,WWV_FLOW_LISTS_OF_VALUES$,,1006663,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
    PUBLIC,WWV_FLOW_GENERIC,,1006687,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
    PUBLIC,WWV_FLOW_FIELD_TEMPLATES,,1006699,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
    RE,RE_PROFILER,,1207268,0,PACKAGE BODY,17-NOV-04,05-APR-12,2012-04-05:10:52:51,INVALID,N,N,N,2,
    HERMAN,RDT_1,,1713973,1713973,TABLE,18-JAN-06,18-JAN-06,2006-01-18:13:18:02,INVALID,N,N,N,1,
    APPS,XLA_00707_AAD_C_000026_PKG,,2362389,0,PACKAGE BODY,03-AUG-07,05-APR-12,2012-04-05:10:52:52,INVALID,N,N,N,2,
    APPS,XLA_20065_AAD_C_000030_PKG,,2370236,0,PACKAGE BODY,16-AUG-07,05-APR-12,2012-04-05:10:53:05,INVALID,N,N,N,2,
    APPS,FSAH_DUPLICATE_PKG,,2385307,0,PACKAGE BODY,23-AUG-07,05-APR-12,2012-04-05:10:53:08,INVALID,N,N,N,2,
    APPS,XLA_00707_AAD_C_000044_PKG,,2661674,0,PACKAGE BODY,14-JAN-08,05-APR-12,2012-04-05:10:53:08,INVALID,N,N,N,2,
    APPS,MSD_DEM_OBI_DEMANTRA_MV,,3255317,,MATERIALIZED VIEW,09-JUL-08,09-JUL-08,2008-07-09:09:14:28,INVALID,N,N,N,19,
    CA,F,,3260665,3260665,TABLE,24-SEP-08,24-SEP-08,2008-09-24:16:00:14,INVALID,N,N,N,1,
    CA,G,,3260683,3260683,TABLE,24-SEP-08,24-SEP-08,2008-09-24:16:00:15,INVALID,N,N,N,1,

  • RAC node restarting!

    hi
    one of our RAC environment keep restarting.
    i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
    this is the situation:
    crsd.log:
    2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:09:00.134: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
    2009-02-04 00:09:08.016: [    CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
    2009-02-04 00:09:08.016: [    CRSD][1]32Active Version and Software Version are same
    2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
    2009-02-04 00:09:08.037: [  OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
    (752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
    2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    ocssd.log:
    [    CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
    [    CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
    [    CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
    [    CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
    [    CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
    [    CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
    [    CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
    [    CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
    [    CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
    [    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
    [    CSSD]CLSS-3001: local node number 1, master node number 1
    [    CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
    [    CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
    [    CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
    alertlog:
    [cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
    2009-02-03 23:55:20.821
    [cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
    2009-02-03 23:55:28.376
    evmd.log:
    Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
    2009-02-04 00:08:58.331: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
    2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
    2009-02-04 00:08:59.948: [    EVMD][1]32EVMD waiting for CSS to be ready err = 3
    2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
    syslog:
    Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
    Feb 4 00:08:45 lourmel su: + tty?? root-orac
    Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
    Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
    Feb 4 00:08:52 lourmel above message repeats 2 times
    Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
    Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
    when i checked(befor the restart) the command crs_stat i got the message:
    ORA-0184: Cannot communicate wirh CRS
    crsctl check crs gives us:
    Failure 1 contacting CSS daemon
    Cannot communicate with CRS
    Cannot communicate with EVM
    as i said befor, the machine always restarting
    anyone have an idea?? please

    Dear All,
    I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
    In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
    Then i checked the things which added with this Patchset.
    The Most interesting found , Oracle Added a New Daemon- oprocd.
    # ps -efl | grep oprocd
    4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
    These are Interesting Points about above line
    1.This Process is running by root user
    2. With Highest Priority -40
    3. Probing every Seconds (t 1000)
    4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
    5. Process status is Fatal (-f)
    Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
    So the solution is increase the Margin time for 500 Milli second to 10 seconds.
    These are following steps to increase the Margin time.
    Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
    1. Stop The CRS Process
    #crsctl stop crs
    #<CRS_HOME>/bin/oprocd stop
    2. Ensure that Clusterware stack is down and not running
    #ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
    This should return no processes.
    3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
    #crsctl set css diagwait 13 -force
    4. Check if diagwait is successfully set.
    #crsctl get css diagwait
    5. Restart the Oracle Clusterware on all the nodes by executing:
    #crsctl start crs
    (Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
    6. The oprocd daemon process will show with -m 10000
    # ps -efl| grep oprocd
    # 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
    Rollback Procedure-
    If You need to unset oprocd value due any reason
    #crsctl unset css diagwait
    I am confident, The abnormal RAC Node restart problem will solve with this workaround.
    Regards,
    Sumit
    Bangalore,India

  • What is best use of 1400 gb SGA (2 rac nodes 768gb each)

    currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb).  Database is 4TB in size on regular SAN (10k speed).
    8gb SGA.
    New boss wants to update system to the max to get best performance possible  Money is a concern of course but budget is pretty high,  Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows).  reports take 5 sec to 5 minutes, Our job is get the fastest system possible,  We have total of 8 licenses available so we can have 16 cores.  We are also getting a 6tb all flash SSD array for database.  we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole).  sparc has too many threads and without PS oracle runs query in single thread. 
    we have speced out the following system for each RAC node
    HP ProLiant DL380p Gen8 8 SFF server
    2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
    768 gb ram
    2 HP 300GB 6G SAS 15K drives for database software
    this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga).  this will guarantee an available core for each user.  we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables.  this we hope will reduce reads from the SSD to just data from fact tables.,
    Are we doing a massive overkill here?  the budget for this was way less than what our boss expected.  will that big an sga be wasted will say a 256gb be fine.  or will oracle take advantage of it and be able to keep most blocks in there.
    will an sga that big cause oracle problems due to overhead of handling that much ram?

    Current System:
    ===========
    a. Version : 11.2.0.3
    b. Unix Sun
    c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
    d. database 4TB
    e. SAN - 10k speed disk drives
    f. 8gb SGA
    g. 1.2 gb ??
    h. Users --> 12-16 concurrent and run reports varying size
    i. reports elasped time 5 sec to 5 mins
    j. cpu license -->8
    Target System
    ===========
    a. Version: 11.2.0.3
    b. HP ProLiant DL380p Gen8 8 SFF server
    c. RAM --> 768 GB
    d. 2 HP 300GB 6G SAS 15K drives for database software
    e. large keep pool -->90 gb to  hold all dimension tables. 
    f.  SSD to just data from fact tables
    g. SGA -->256gb
    Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
    Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
    Investment in identifying the performance issues of current system would be best investment in current scenario.

  • Errors in Forte 3.5 -Upgrade issue

    hello ,
    We are having a frequent disruption in the communication SO which has an
    externalConnection Class. Did anything change in 3.5? We did not have
    these errors in 3L!
    After these erros, we have a mutex locking problem which leaves our
    application hanging.
    Any help will be greatly appreciated!
    thanks
    suma
    Here is an Excerpt of the log file
    Task 9: extConn4030a808.Write: 194 bytes written to 10
    INFORMATION: Network partner closed connection. This usually means the
    process at the other end of the wire failed. Please go look there and
    find
    out why.
    Class: qqsp_DistAccessException
    Error #: [501, 152]
    Detected at: qqcm_HoseFSM::ReceivedClose at 2
    Error Time: Thu May 24 09:16:29
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    "MerlinWindows_cl41", pid 23406 on node laxrc2 in environment laxrc2.
    INFORMATION: Error parameters for Set:0 Msg:0:
    Class: qqsp_DistAccessException
    Detected at: qqcm_HoseFSM::ReceivedClose at 1
    Error Time: Thu May 24 09:16:29
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    24-May-2001 09:16:47: VH.G.G..GUIT/GUI0/PLAXRC172.23.4.401/../..;PGYX;
    Task 9: extConn4030a808.Write: 233 bytes written to 10
    INFORMATION: Network partner closed connection. This usually means the
    process at the other end of the wire failed. Please go look there and
    find
    out why.
    Class: qqsp_DistAccessException
    Error #: [501, 152]
    Detected at: qqcm_HoseFSM::ReceivedClose at 2
    Error Time: Thu May 24 09:16:47
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    "MerlinWindows_cl41", pid 23406 on node laxrc2 in environment laxrc2.
    INFORMATION: Error parameters for Set:0 Msg:0:
    Class: qqsp_DistAccessException
    Detected at: qqcm_HoseFSM::ReceivedClose at 1
    Error Time: Thu May 24 09:16:47
    regards,
    Suma Venkatesh

    -----Original Message-----
    From: Venkatesh, Suma
    Sent: Thursday, May 24, 2001 2:39 PM
    To: '[email protected]'
    Subject: Errors in Forte 3.5 -Upgrade issue
    hello ,
    We are having a frequent disruption in the communication SO which has an
    externalConnection Class. Did anything change in 3.5? We did not have
    these errors in 3L!
    After these erros, we have a mutex locking problem which leaves our
    application hanging.
    Any help will be greatly appreciated!
    thanks
    suma
    Here is an Excerpt of the log file
    Task 9: extConn4030a808.Write: 194 bytes written to 10
    INFORMATION: Network partner closed connection. This usually means the
    process at the other end of the wire failed. Please go look there and
    find
    out why.
    Class: qqsp_DistAccessException
    Error #: [501, 152]
    Detected at: qqcm_HoseFSM::ReceivedClose at 2
    Error Time: Thu May 24 09:16:29
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    "MerlinWindows_cl41", pid 23406 on node laxrc2 in environment
    laxrc2.
    INFORMATION: Error parameters for Set:0 Msg:0:
    Class: qqsp_DistAccessException
    Detected at: qqcm_HoseFSM::ReceivedClose at 1
    Error Time: Thu May 24 09:16:29
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    24-May-2001 09:16:47: VH.G.G..GUIT/GUI0/PLAXRC172.23.4.401/../..;PGYX;
    Task 9: extConn4030a808.Write: 233 bytes written to 10
    INFORMATION: Network partner closed connection. This usually means the
    process at the other end of the wire failed. Please go look there and
    find
    out why.
    Class: qqsp_DistAccessException
    Error #: [501, 152]
    Detected at: qqcm_HoseFSM::ReceivedClose at 2
    Error Time: Thu May 24 09:16:47
    Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
    (partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
    [F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
    "MerlinWindows_cl41", pid 23406 on node laxrc2 in environment
    laxrc2.
    INFORMATION: Error parameters for Set:0 Msg:0:
    Class: qqsp_DistAccessException
    Detected at: qqcm_HoseFSM::ReceivedClose at 1
    Error Time: Thu May 24 09:16:47
    regards,
    Suma Venkatesh

  • Multiple Standby Databases on same RAC nodes.

    We have a 3 node Oracle 10gR2 RAC production environment on site A and a 3 node Oracle 10g RAC standby environment on site B. Both use like HW and OS - HP BL45p with RHEL AS 4.x.
    Can we have heterogeneous standby databases(logical and physical) running on the same RAC nodes?
    Can the 2 apply processes (MRP-manageed recovery process & LSP-logical standby process) coexists on the same set of nodes in a cluster at the same time. Are there any conflicts or limitations?
    Is there any documentation that supports this?

    Would Active Data Guard give you the best of both worlds?
    The caveat might be your SID_LIST_LISTENER setup.
    SID_LIST_LISTENER =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = PLSExtProc)
    (ORACLE_HOME = /u01/app/oracle/product/11.2.0)
    (PROGRAM = extproc)
    (SID_DESC =
    (global_dbname = <database1>_DGMGRL.yourdomain)
    (ORACLE_HOME = /u01/app/oracle/product/11.2.0)
    (sid_name = <database1>)
    (SID_DESC =
    (global_dbname = <database2>_DGMGRL.yourdomain)
    (ORACLE_HOME = /u01/app/oracle/product/11.2.0)
    (sid_name = <database2>)
    I have a server with ten standbys on it currently and no issues.
    As long as the version is the same you should be good.

  • Migration from Single Node to RAC Node

    Hi,
    We are planning to migrate the Database from Single-Node to RAC Node. Are there any checklist list to be considered regarding Performance, System and Database ( or any other topic), before migrating to RAC.
    Thanks

    If it is already the same version, you will need to focus any "single-threadedness" of the application like sequences that are mandatory sequential, "semaphore-style" locking etc...
    My experience is that going RAC and in particular to ASM can give a performance boost. If it is the same version (11gR2 for example) then any statistics-related performance issues will still be there.

  • Huge number of idle connections from loopback ip on oracle RAC node

    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thanks

    user12959884 wrote:
    Hi,
    We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
    netstat -an |grep -i idle|more
    127.0.0.1.47971 Idle
    any insight will be helpful.
    The server is suffering memory issues occasionally (once in a month).
    ORA-27300: OS system dependent operation:fork failed with status: 11
    ORA-27301: OS failure message: Resource temporarily unavailable
    Thankswe can not control what occurs on your DB Server.
    How do I ask a question on the forums?
    SQL and PL/SQL FAQ
    post results from following SQL
    SELECT * FROM V$VERSION;

  • RAC node connected to outside DB  and pass 2 IP address

    Experts,
    we have a 4 nodes 11.1 RAC at red hat
    As we know each node have 3 IP. --public, vip and privated IP.
    it works well in domain inside network.
    But we get a problem when try to connect to outside network client's database.
    the connection string pass 2 IPs to client firewall (based on network monitor).
    listener log show that connection is OK. But Conection is still blocked by client's firewall side.
    The client network staff told us that we passed two IP address during connected connection.
    Could some experts explain why does the RAC node's connected requested passs two IP to client database?
    It is only discovered by network staff. we could not see 2 IP information in listener log file.
    Is it our firewall NAT setting issue? or client firewall NAT setting issue
    Thanks
    Jim
    Edited by: user589812 on Jan 21, 2010 2:25 PM

    Hi Experts
    The Two IP addresses that were being passed were one of the load balancer and one of the db server. the load balancer was supposed to mask the load balancer IP address and only pass the db IP address. Somehow, we were sending both IP to client database--outside network. But IT works well in inter network side. How to eliminate the load balancer IP address from coming to client network firewall --to client database server side?
    I looking for help!
    JIm

  • What would happened when one RAC node's public NIC down ?

    Dear all,
    There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
    So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.

    Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.

  • Solaris RAC nodes re-booting

    I have a pre-production 2-node cluster running on Solaris 10, Oracle 10.2.0.3 with the Oracle CRS, and using a NetApp filer as the shared storage.
    I also have a separate Solaris server running Grid Control 10.2.0.3, with the repository as one of the databases on the RAC (don't know if this is relevant to my problem).
    Periodically both RAC nodes reboot, with no trace of why (the GC server is fine). There is nothing logged in the Solaris logs (messages file), CRS logs, Oracle logs or the NetApp logs.
    All that is shown is the relevant service starting up following the shutdown.
    Has anyone any experience of this, or any thoughts on which component may cause such an issue?
    Thanks in advance
    Bob

    What type of Sun hardware are you using?
    Below is the Action Plan Oracle support sent me on my SR on this issue, not sure if any of this was provided to you or would be of help.
    ACTION PLAN
    ============
    1. there is nothing on the files at all that sheds any light on the issue
    agian 3 sperate sets of clusters all losing all nodes at the same tiem is a very strange occurance. Please be sure to have the admin look for
    anything in common wiht all custers.
    2. advice placing oswatcher on the systems Note.301137.1 Ext/Pub OS Watcher User Guide
    if we should have another occurances we will want the oswatcher logs for 1 hr before issue thru issue
    also see if the unix admin perhaps has any os stats from this occurance
    3. advice settign ntpd to run with -x option I do see that you are having negative time changes
    at times
    -x will give us a skew rather then an abbrupt time change
    4. advice setting this when you can
    Please do the following
    set the diagwait parameter:
    crsctl set css diagwait N [-force]
    Where N is the number of seconds to wait for a filesystem sync to
    complete (after this wait the node will reboot regardless of whether the
    sync has completed). This change must be made with the clusterware
    down, which will require the '-force', or with the stack up on just 1
    node, after which the stack on that node must be restarted before the
    stack starts up on any of the other nodes.
    N should be set to 25 (25 seconds)
    5. advice that you have with pcw mlr#6 Patch 5980915 on the systems as well
    but I do not believe that this was an oracle bug the reason for placing the patch on is for advanced diagnostics that is in that patchset
    6. the two issues sun is workking on
    Sun is working to resolve a time skew issue and a Solaris 10 kernel SIGALRM Sun#6292092 in addition to Sun#6595936.
    7. we do have a diagnostic oprocd that soem sites have used but on thier test systems. It stops reboots adn dumps information but I have
    been hesitant to place it on production boxes if you continue to have issues we may consider download the oprocd_skewfix_noreboot fro
    m Bug 6279879 but at this time I do not belvve that is warrented

  • Rac node failed how do you bring it back up?

    Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?

    There are typically two basic reasons why a RAC node will go down.
    A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
    An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
    You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again.

  • Rac 10gr2 upgrade

    Hi experts,
    We are prototyping RAC database upgrade from 10.1.0 to 10.2.0.4. I just created a new directory as 10.2.0 ORACLE_HOME. Since the 10gr1 database is running, all the environmental variables set by login profile point to the current 10.1.0 database configuration. Need advice on a few of questions:
    1) At this point, I only want to use OUI to install 10gr2 binaries, and 10.2.0.4 binaries. Thought I could just kick off runInstaller to get it done. However, after reading Oracle documentation and a few articles, I am not sure if I should set any environmental variables.
    One article recommends
    . set ORACLE_BASE=/u01/app/oracle
    . set ORACLE_SID=orcl1 # Each RAC node must have a unique Oracle SID!
    . set LD_LIBRARY_PATH=$ORACLE_HOME/lib
    . unset ORACLE_HOME
    Others:
    . set ORACLE_HOME
    . set PATH
    . set LD_LIBRARY_PATH
    . set ORA_CRS_HOME
    . unset TNS_ADMIN
    I know I need to have all of them set properly prior to 10gr2 upgrade. Do I need to set any of them at all just to install 10gr2 binaries and patchset via OUI? If so, which ones?
    2) There are a few 10.2.0.4 patches needed to be opatch applied to each cluster node.
    I noticed some of the post processing involves sql executions via sqlplus.
    My guess is to withhold these sql command steps temporarily, and execute them after upgrade to 10.2.0.4 completes. Is this the correct interpretation of the patch application?
    Thanks, Newbie

    There is a CRS PSU for CRS 10.2
    Patch# 8705958 - 10.2.0.4.2 for CRS PSU 2
    Should the PSU should be applied to CRSprior to database upgrade from 10.1 to 10.2? Can it be applied after the upgrade? What is the general guideline on this?
    Thanks.

  • How to execute DBMS_JOB at exactly one RAC node

    Hello,
    after unsuccessfully searching for "RAC" and "DBMS_JOB" I open this thread.
    Can you tell me how to dedicate one RAC-node for doing my batch-jobs
    which are started by using dbms_job (so there is no tnsnames.ora which is used).
    Thanks in advance

    hi,
    Let's say the instances are named:I1, I2, I3, I4
    Issue:ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I1';
    ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I2';
    ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I3';
    ALTER SYSTEM SET JOB_QUEUE_PROCESSES=10 SCOPE=BOTH SID='I4';So that only instance I4 will run jobs.
    Regards,
    Yoann.

Maybe you are looking for