RAC node upgrade issue
We have our company's database on Oracle Real application clusters database consisting of two RAC nodes. We would like to perform some hardware upgrades on both the RAC nodes. Could anyone please tell if it is OK to shutdown one instance at a time and remove all the network/interconnect cables from it and at the same time the other RAC node keeps working. After one node is upgraded and all the network/interconnect cables are connected back to it, will everything be just OK like before or are there are certain things to be cautious about ?
Thanks in advance
I think you got to do a clean shutdown so that it does not require any instance recovery when you re-open the database
SQL>SHUTDOWN TRANSACTIONAL
would allow all the current transactions to complete then shutdown the db
Similar Messages
-
Issues while Add / Delete RAC Node in Oracle 10g R2
Hi,
I have an requirement to add a New Node in the existing 2 Node RAC at Production, where 1 Node is Active & other one is passive due to licence issue & cannot keep both the nodes as active. Due to performance issues (Memory , CPU Cores ..etc) we are adding another new node.
Right now we are planning to add a 3rd database node making the new node as active and current active one as passive which is a swap & later on after final observation delete and decommission the current passive node.
This activity is checked at the Dev database with the same infrastructure (OS + Memory ..etc) but want to check what is the best approach (or) challenges we face during the RAC Node Addition / Deletion
RAC DB Version : 10.2.o.4
OS Version : RHEL 5.8
(1) Is the approach is right one , First Adding the node & later on delete
(2) If the approach is the correct , what would be the behavious of the 3rd node in means of active (or) passive
(3) We have taken RMAN backup , OS backup , CRS , ORACLE_HOME , ASM_Home backup , OCR & VD.
(4) Could you please give detail steps for adding / deleting node in 10g R2.
(5) Are they any known bugs to us with the DB release (or) OS while performing this activity.
Since this is a production machine we want to more proactive . Please correct or add any thing i am missing out ...
With Thanks,
RakeshHello Rakesh,
Please follow the following steps.
Node Addition Steps
1. Install and configure OS and hardware for new node.
2. Add Oracle Clusterware to the new node.
3. Configure ONS for the new node.
4. Add ASM home to the new node.
5. Add Databse home to the new node.
6. Add a listener to the new node.
7. Add ASM instance to the New Node.
8. Add a database instance to the new node.
Details of steps
1. run cluvfy to verify whether New node is ready for addition or not.
$ cluvfy stage -pre crsinst -n node2
2. from node1, execute
$/u01/app/crs11g/oui/bin/addNode.sh
3. Specify node2 vip address and follow instructions.
4. In the last of installtion it may through an wornig and will ask to click on YES. click on YES
5. from node1,
/u01/app/crs11g/bin/racgons add_config node2:6200
6. from Node1,set ORACLE_HOME=ASM_HOME and then execute addNode.sh from $ASM_HOME/oui/bin and Follow instrusctions.
7. From node1, set ORACLE_HOME=DB_HOME and then
/u01/app/oracle/product/11.1.0/db_1/oui/bin/addNode.sh
and Follow instructions.
8. from node2 start NETCA and configure listener for new node. While configuring Listener select the name of new node.
9. from node1 start dbca from ASM Home to configure ASM instance for new node.
10. Again from node1 start dbca from DB Home to add DB instance
Node deletion Steps
1. Delete the Database instance on the node to be deleted.
2. Clean up the ASM instance.
3. Remove the listener from the node to be deleted.
4. Remove the node from the database.
5. Remove the node from ASM.
6. Remove ONS configuration from the node to be deleted.
7. Remove the node from the clusterware
Details of Steps
1. Remove database Instance of node2
Dbca -> instance Management -> delete instance -> password for sys -> select node -> finish.
2. Stop asm for node2 from any nodes.
$srvctl stop asm –n node2
3. Remove asm for node2
$ srvctl remove asm -n node2
4. Remove Listener from Node2 using NETCA.
5. From Node2:
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
6. From Node2, start runinstaller from Oracle_DB_Home/oui/bin, and remove "DB_HOME"
$ ./runinstaller
On the WELCOME Screen -> Deinstall product -> Select dbhome name (OraDb10g_Home1) -> Remove
7. From Node1:
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
8. From Node2, set Oracle_Home to asm_1 and then fire:
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" -local
9. From Node2, start OUI and deinstall ASM Home.
10. From Node1, Set ORACLE_HOME= /u01/app/oracle/product/11.1.0/asm_1
11. From Node1: from /u01/app/oracle/product/11.1.0/asm_1/oui/bin, start OUI
./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1"
12. From Node2: as a root user (#) execute rootdelete.sh from /u01/app/crs11g/install
# /u01/app/crs11g/install/rootdelete.sh
13.From Node-1 first find out the node numbers
# /u01/app/crs11g/bin/olsnodes -n
output : node1 1
node2 2
14. From Node-1 as a root user (#):
# /u01/app/crs11g/install/rootdeletenode.sh node2[Node_Name] 2[node_no]
output:
CRS nodeapps are deleted successfully
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Node deletion operation successful.
'node2' deleted successfully
15. From Node2 set ORACLE_HOME=CRS_HOME and then execute
$$ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node2" CRS=TRUE -local
16. ./runInstaller and remove CRS_HOME
17. From Node-1:
$ /u01/app/crs11g/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES=node1" CRS=TRUE
18. check node is deleted from ./crs_stat -t -
EBS 12.1.3 Upgrade issue
Hi,
I installed 12.1.1 with the Vision Database on my machines which has a 64 bit Linux 5 update 5 operating system and everything went fine. After that I decided to upgrade to 12.1.3. I installed the following patches in order they are given below :
1) R12.AD.B.DELTA.3 Patch 9239089
2) Patch 9239090
3) Patch 9239095
4) 9817770:R12.ATG_PF.B POST-R12.ATG_PF.B.DELTA.3CONSOLIDATED PATCH
5) 9966055:R12.FND.B [TRANSLATED VERSION OF FNDSCSGN NOT LAUNCHED
The only issues I had was during the compilation of one form library I got an error in Patch 923090 and I said to continue. The patches completed successfully (I think) since I can log into applications. My issue is I have a lot of invalid objects. I tried to manually compile them but failed then followed one of the other threads to run utlrp.sql and utlirp.sql scripts as per (R12.1.1 - Invalid Objects After Patching [ID 1093163.1])
I have followed that document and tried to run but my invalid count stays the same. Here is the snapshot when I query the database :
SQL> SELECT COUNT(*), OWNER FROM DBA_OBJECTS WHERE STATUS='INVALID'
2 GROUP BY OWNER;
COUNT(*) OWNER
1 RE
2 CA
4 PUBLIC
1 HERMAN
228 APPS
2 FLOWS_010500
SQL> @$ORACLE_HOME/rdbms/admin/utlrp.sql
TIMESTAMP
COMP_TIMESTAMP UTLRP_BGN 2012-04-05 07:23:05
DOC> The following PL/SQL block invokes UTL_RECOMP to recompile invalid
DOC> objects in the database. Recompilation time is proportional to the
DOC> number of invalid objects in the database, so this command may take
DOC> a long time to execute on a database with a large number of invalid
DOC> objects.
DOC>
DOC> Use the following queries to track recompilation progress:
DOC>
DOC> 1. Query returning the number of invalid objects remaining. This
DOC> number should decrease with time.
DOC> SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
DOC>
DOC> 2. Query returning the number of objects compiled so far. This number
DOC> should increase with time.
DOC> SELECT COUNT(*) FROM UTL_RECOMP_COMPILED;
DOC>
DOC> This script automatically chooses serial or parallel recompilation
DOC> based on the number of CPUs available (parameter cpu_count) multiplied
DOC> by the number of threads per CPU (parameter parallel_threads_per_cpu).
DOC> On RAC, this number is added across all RAC nodes.
DOC>
DOC> UTL_RECOMP uses DBMS_SCHEDULER to create jobs for parallel
DOC> recompilation. Jobs are created without instance affinity so that they
DOC> can migrate across RAC nodes. Use the following queries to verify
DOC> whether UTL_RECOMP jobs are being created and run correctly:
DOC>
DOC> 1. Query showing jobs created by UTL_RECOMP
DOC> SELECT job_name FROM dba_scheduler_jobs
DOC> WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>
DOC> 2. Query showing UTL_RECOMP jobs that are running
DOC> SELECT job_name FROM dba_scheduler_running_jobs
DOC> WHERE job_name like 'UTL_RECOMP_SLAVE_%';
DOC>#
PL/SQL procedure successfully completed.
TIMESTAMP
COMP_TIMESTAMP UTLRP_END 2012-04-05 07:24:29
DECLARE
ERROR at line 1:
ORA-00904: "FALSE": invalid identifier
ORA-06512: at line 13
DOC> The following query reports the number of objects that have compiled
DOC> with errors (objects that compile with errors have status set to 3 in
DOC> obj$). If the number is higher than expected, please examine the error
DOC> messages reported with each object (using SHOW ERRORS) to see if they
DOC> point to system misconfiguration or resource constraints that must be
DOC> fixed before attempting to recompile these objects.
DOC>#
OBJECTS WITH ERRORS
4
DOC> The following query reports the number of errors caught during
DOC> recompilation. If this number is non-zero, please query the error
DOC> messages in the table UTL_RECOMP_ERRORS to see if any of these errors
DOC> are due to misconfiguration or resource constraints that must be
DOC> fixed before objects can compile successfully.
DOC>#
ERRORS DURING RECOMPILATION
4
PL/SQL procedure successfully completed.
Invoking Ultra Search Install/Upgrade validation procedure VALIDATE_WK
Ultra Search VALIDATE_WK done with no error
PL/SQL procedure successfully completed.
However for some reason the job utlrp.sql only shows 4 invalids as shown when I run it. But the number changes once the script is finished and I am not sure why. If I query SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6) now after utlrp.sql has finished running I get the below
SQL> SELECT COUNT(*) FROM obj$ WHERE status IN (4, 5, 6);
COUNT(*)
234
But if I query the same query while utlrp.sql is running I get a number as low as 10 I have seen but once the script finishes it goes back up. I don't know what to do. I have run autoconfig on both the db and apps tier. Is there anything anyone can suggest me ? I am totally lost on this.
Thanks..Hi I did set the disablefast_validate=TRUE and re-ran only utlrp.sql to re-compile now I have a lot more packages compiled but I still have some invalids. I just want to know can I ignore these and move on or are they important packages / objects in APPS which may impact functionality ?
below is the results of the invalid objects.
Thanks
OWNER,OBJECT_NAME,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS,TEMPORARY,GENERATED,SECONDARY,NAMESPACE,EDITION_NAME
PUBLIC,WWV_FLOW_LIST_OF_VALUES_DATA,,1006661,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
PUBLIC,WWV_FLOW_LISTS_OF_VALUES$,,1006663,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
PUBLIC,WWV_FLOW_GENERIC,,1006687,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
PUBLIC,WWV_FLOW_FIELD_TEMPLATES,,1006699,,SYNONYM,09-FEB-07,09-FEB-07,2007-02-09:00:33:03,INVALID,N,N,N,1,ORA$BASE
RE,RE_PROFILER,,1207268,0,PACKAGE BODY,17-NOV-04,05-APR-12,2012-04-05:10:52:51,INVALID,N,N,N,2,
HERMAN,RDT_1,,1713973,1713973,TABLE,18-JAN-06,18-JAN-06,2006-01-18:13:18:02,INVALID,N,N,N,1,
APPS,XLA_00707_AAD_C_000026_PKG,,2362389,0,PACKAGE BODY,03-AUG-07,05-APR-12,2012-04-05:10:52:52,INVALID,N,N,N,2,
APPS,XLA_20065_AAD_C_000030_PKG,,2370236,0,PACKAGE BODY,16-AUG-07,05-APR-12,2012-04-05:10:53:05,INVALID,N,N,N,2,
APPS,FSAH_DUPLICATE_PKG,,2385307,0,PACKAGE BODY,23-AUG-07,05-APR-12,2012-04-05:10:53:08,INVALID,N,N,N,2,
APPS,XLA_00707_AAD_C_000044_PKG,,2661674,0,PACKAGE BODY,14-JAN-08,05-APR-12,2012-04-05:10:53:08,INVALID,N,N,N,2,
APPS,MSD_DEM_OBI_DEMANTRA_MV,,3255317,,MATERIALIZED VIEW,09-JUL-08,09-JUL-08,2008-07-09:09:14:28,INVALID,N,N,N,19,
CA,F,,3260665,3260665,TABLE,24-SEP-08,24-SEP-08,2008-09-24:16:00:14,INVALID,N,N,N,1,
CA,G,,3260683,3260683,TABLE,24-SEP-08,24-SEP-08,2008-09-24:16:00:15,INVALID,N,N,N,1, -
hi
one of our RAC environment keep restarting.
i've disable the init.cssd, init.crs, init.evmd in the /etc/inittab in order to check the logs.
this is the situation:
crsd.log:
2009-02-04 00:09:00.118: [ COMMCRS][9]clsc_connect: (8000000100318640) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:09:00.132: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:09:00.134: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2009-02-04 00:09:08.016: [ CRSD][1]32Daemon Version: 10.2.0.2.0 Active Version: 10.2.0.2.0
2009-02-04 00:09:08.016: [ CRSD][1]32Active Version and Software Version are same
2009-02-04 00:09:08.017: [ CRSMAIN][1]32Initializing OCR
2009-02-04 00:09:08.037: [ OCRRAW][1]proprioo: for disk 0 (/dev/rdsk/ora_ocr_raw), id match (1), my id set (752560621,1028247821) total id sets (1), 1st set
(752560621,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2009-02-04 00:09:08.140: [ CSSCLNT][24]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
ocssd.log:
[ CSSD]2009-02-03 21:52:08.651 [9] >USER: clssnmHandleUpdate: NODE 1 (node1l) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2009-02-03 21:52:08.651 [9] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2009-02-03 21:52:08.651 [16] >TRACE: clssnmDoSyncUpdate: Sync Complete!
[ CSSD]2009-02-03 21:52:08.722 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2009-02-03 21:52:08.724 [17] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2009-02-03 21:52:08.749 [17] >USER: NMEVENT_RECONFIG [00][00][00][02]
[ CSSD]2009-02-03 21:52:08.749 [17] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[ CSSD]2009-02-03 21:52:08.751 [13] >TRACE: clssgmPeerListener: connects done (1/1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2009-02-03 21:52:08.752 [17] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 1
[ CSSD]2009-02-03 21:52:08.753 [17] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2009-02-03 21:52:08.863 [10] >TRACE: clssgmClientConnectMsg: Connect from con(80000001008fd2a0) proc(8000000100ae26a8) pid() proto(10:2:1:1)
[ CSSD]2009-02-03 21:52:08.864 [10] >TRACE: clssgmClientConnectMsg: Connect from con(8000000100ae0128) proc(8000000100ae2a10) pid() proto(10:2:1:1) from con(8000000100aa32c0) proc(8000000100aa5b90) pid() proto(10:2:1:1)
alertlog:
[cssd(2535)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 .
2009-02-03 23:55:20.821
[cssd(2575)]CRS-1605:CSSD voting file is online: /dev/rdsk/ora_voting_raw. Detai ls in /work/crs/product/10.2/crs/log/lourmel/cssd/ocssd.log.
2009-02-03 23:55:28.376
evmd.log:
Oracle Database 10g CRS Release 10.2.0.2.0 Production Copyright 1996, 2004, Oracle. All rights reserved
2009-02-04 00:08:58.331: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:08:59.939: [ COMMCRS][9]clsc_connect: (800000010007d658) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_loud))
2009-02-04 00:08:59.946: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2009-02-04 00:08:59.948: [ EVMD][1]32EVMD waiting for CSS to be ready err = 3
2009-02-04 00:09:07.596: [ CSSCLNT][1]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
syslog:
Feb 4 00:08:41 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:45 lourmel sfd[2153]: starting the daemon.
Feb 4 00:08:45 lourmel su: + tty?? root-orac
Feb 4 00:08:45 lourmel krsd[2152]: Delay time is 300 seconds
Feb 4 00:08:43 lourmel syslog: Oracle Cluster Ready Services starting up automatically.
Feb 4 00:08:52 lourmel above message repeats 2 times
Feb 4 00:08:52 lourmel syslog: Cluster Ready Services completed waiting on dependencies.
Feb 4 00:08:53 lourmel syslog: Running CRSD with TZ =
when i checked(befor the restart) the command crs_stat i got the message:
ORA-0184: Cannot communicate wirh CRS
crsctl check crs gives us:
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
as i said befor, the machine always restarting
anyone have an idea?? pleaseDear All,
I recently upgrade the Few RAC setups with Oracle 10g Patchset 3 (10.2.0.4) on Linux Servers
In one of the RAC setup, found servers are rebooting daily. The same setup was working fine and problem started only after applying the Patchset. Checked all the logs and Found nothing relevant.
Then i checked the things which added with this Patchset.
The Most interesting found , Oracle Added a New Daemon- oprocd.
# ps -efl | grep oprocd
4 S root 6440 6063 0 -40 - - 2114 - Mar03 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -hsi 5:10:50:75:90 -f
These are Interesting Points about above line
1.This Process is running by root user
2. With Highest Priority -40
3. Probing every Seconds (t 1000)
4. waiting CPU response for 500 Milliseconds ( -m 500 means margin time is 500 Milli Seconds)
5. Process status is Fatal (-f)
Now I am concluding these points- This daemon will probe cpu every second and wait for response within 500 Mill seconds. If in the 500 Milli second not getting any response from the cpu, will assume the CPU is hang and try to Reboot the Machine. The OPERATING SYSTEM will not get enough time to write the system logs and server reboots.
So the solution is increase the Margin time for 500 Milli second to 10 seconds.
These are following steps to increase the Margin time.
Please Remember- The Modification process need Downtime and You need to stop cluster service in all member nodes.
1. Stop The CRS Process
#crsctl stop crs
#<CRS_HOME>/bin/oprocd stop
2. Ensure that Clusterware stack is down and not running
#ps -ef |egrep "crsd.bin|ocssd.bin|evmd.bin|oprocd"
This should return no processes.
3. From one node of the cluster, change the value of the "diagwait" parameter to 13 by issuing the command as root:
#crsctl set css diagwait 13 -force
4. Check if diagwait is successfully set.
#crsctl get css diagwait
5. Restart the Oracle Clusterware on all the nodes by executing:
#crsctl start crs
(Note- If facing any problem to restarting the CRS services, ASM and Database, You can reboot the Nodes.The Cluster and Database will come automatically due to init startup scripts.)
6. The oprocd daemon process will show with -m 10000
# ps -efl| grep oprocd
# 4 S root 6440 6063 0 -40 - - 2114 - Feb02 ? 00:00:00 /opt/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f
Rollback Procedure-
If You need to unset oprocd value due any reason
#crsctl unset css diagwait
I am confident, The abnormal RAC Node restart problem will solve with this workaround.
Regards,
Sumit
Bangalore,India -
What is best use of 1400 gb SGA (2 rac nodes 768gb each)
currently using 11.2.0.3.0 on unix sun sever with 2 RAC nodes each 8 UltraSPARC-T1 cpus (came out in 2005) four threads each so oracle sees 32 CPUS very slow(1.2 gb). Database is 4TB in size on regular SAN (10k speed).
8gb SGA.
New boss wants to update system to the max to get best performance possible Money is a concern of course but budget is pretty high, Our use case is 12-16 users at same time, running reports some small others very large (return single row or 10000s or rows). reports take 5 sec to 5 minutes, Our job is get the fastest system possible, We have total of 8 licenses available so we can have 16 cores. We are also getting a 6tb all flash SSD array for database. we can get any CPU we want but we cant use parallel query server due to all kinds of issues we have experienced (too many slaves, RAC interconnect saturation etc, whack-a-mole). sparc has too many threads and without PS oracle runs query in single thread.
we have speced out the following system for each RAC node
HP ProLiant DL380p Gen8 8 SFF server
2 Intel Xeon E5-2637v2 3.5GHz/4-core cpus
768 gb ram
2 HP 300GB 6G SAS 15K drives for database software
this will give us total of 4 Xeon E5-2637v2 cpus 16 cores total (,5 factor for 8 licenses) and 1536 ram (leaving ~1400 for sga). this will guarantee an available core for each user. we intend to create very very large keep pool around 300 gb for each node that will hold all our dimension tables. this we hope will reduce reads from the SSD to just data from fact tables.,
Are we doing a massive overkill here? the budget for this was way less than what our boss expected. will that big an sga be wasted will say a 256gb be fine. or will oracle take advantage of it and be able to keep most blocks in there.
will an sga that big cause oracle problems due to overhead of handling that much ram?Current System:
===========
a. Version : 11.2.0.3
b. Unix Sun
c. CPU - 8 cpus with 4 threads => 32 logical cpus or cores
d. database 4TB
e. SAN - 10k speed disk drives
f. 8gb SGA
g. 1.2 gb ??
h. Users --> 12-16 concurrent and run reports varying size
i. reports elasped time 5 sec to 5 mins
j. cpu license -->8
Target System
===========
a. Version: 11.2.0.3
b. HP ProLiant DL380p Gen8 8 SFF server
c. RAM --> 768 GB
d. 2 HP 300GB 6G SAS 15K drives for database software
e. large keep pool -->90 gb to hold all dimension tables.
f. SSD to just data from fact tables
g. SGA -->256gb
Reassessment of the performance issues of current system appears to be required.Good performance tuning expert is required to look into tuning issues of current application by analyzing awr performance metrics . If 8GB SGA is not enough,then reason behind so is that queries running in the system are not having good access path to select lesser data to avoid flushing out of recent buffers from different tables involved in the query. Until those issues are identified , wherever you go, performance issue wont be going away as table size increase in future , problem will reappear.Even if the queries are running with more FULL Scan , then re-platforming to Exadata might be right decision as Exadata has smart scan , cell offloading feature which works faster and might be right direction for best performance and best investment for future.Compression (compress for OLTP) could be one of the other feature to exploit to improve further efficiency while reading the lesser block in lesser read time.
Investment in infrastructure will solve a few issue in short term but long term issue will again arise.
Investment in identifying the performance issues of current system would be best investment in current scenario. -
Errors in Forte 3.5 -Upgrade issue
hello ,
We are having a frequent disruption in the communication SO which has an
externalConnection Class. Did anything change in 3.5? We did not have
these errors in 3L!
After these erros, we have a mutex locking problem which leaves our
application hanging.
Any help will be greatly appreciated!
thanks
suma
Here is an Excerpt of the log file
Task 9: extConn4030a808.Write: 194 bytes written to 10
INFORMATION: Network partner closed connection. This usually means the
process at the other end of the wire failed. Please go look there and
find
out why.
Class: qqsp_DistAccessException
Error #: [501, 152]
Detected at: qqcm_HoseFSM::ReceivedClose at 2
Error Time: Thu May 24 09:16:29
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
"MerlinWindows_cl41", pid 23406 on node laxrc2 in environment laxrc2.
INFORMATION: Error parameters for Set:0 Msg:0:
Class: qqsp_DistAccessException
Detected at: qqcm_HoseFSM::ReceivedClose at 1
Error Time: Thu May 24 09:16:29
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
24-May-2001 09:16:47: VH.G.G..GUIT/GUI0/PLAXRC172.23.4.401/../..;PGYX;
Task 9: extConn4030a808.Write: 233 bytes written to 10
INFORMATION: Network partner closed connection. This usually means the
process at the other end of the wire failed. Please go look there and
find
out why.
Class: qqsp_DistAccessException
Error #: [501, 152]
Detected at: qqcm_HoseFSM::ReceivedClose at 2
Error Time: Thu May 24 09:16:47
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
"MerlinWindows_cl41", pid 23406 on node laxrc2 in environment laxrc2.
INFORMATION: Error parameters for Set:0 Msg:0:
Class: qqsp_DistAccessException
Detected at: qqcm_HoseFSM::ReceivedClose at 1
Error Time: Thu May 24 09:16:47
regards,
Suma Venkatesh-----Original Message-----
From: Venkatesh, Suma
Sent: Thursday, May 24, 2001 2:39 PM
To: '[email protected]'
Subject: Errors in Forte 3.5 -Upgrade issue
hello ,
We are having a frequent disruption in the communication SO which has an
externalConnection Class. Did anything change in 3.5? We did not have
these errors in 3L!
After these erros, we have a mutex locking problem which leaves our
application hanging.
Any help will be greatly appreciated!
thanks
suma
Here is an Excerpt of the log file
Task 9: extConn4030a808.Write: 194 bytes written to 10
INFORMATION: Network partner closed connection. This usually means the
process at the other end of the wire failed. Please go look there and
find
out why.
Class: qqsp_DistAccessException
Error #: [501, 152]
Detected at: qqcm_HoseFSM::ReceivedClose at 2
Error Time: Thu May 24 09:16:29
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
"MerlinWindows_cl41", pid 23406 on node laxrc2 in environment
laxrc2.
INFORMATION: Error parameters for Set:0 Msg:0:
Class: qqsp_DistAccessException
Detected at: qqcm_HoseFSM::ReceivedClose at 1
Error Time: Thu May 24 09:16:29
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
24-May-2001 09:16:47: VH.G.G..GUIT/GUI0/PLAXRC172.23.4.401/../..;PGYX;
Task 9: extConn4030a808.Write: 233 bytes written to 10
INFORMATION: Network partner closed connection. This usually means the
process at the other end of the wire failed. Please go look there and
find
out why.
Class: qqsp_DistAccessException
Error #: [501, 152]
Detected at: qqcm_HoseFSM::ReceivedClose at 2
Error Time: Thu May 24 09:16:47
Exception occurred (locally) on partition "MerlinWindows_cl41_Part8",
(partitionId = F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105, taskId =
[F2E11800-5003-11D5-BF99-AE145F2AAA77:0x105.7]) in application
"MerlinWindows_cl41", pid 23406 on node laxrc2 in environment
laxrc2.
INFORMATION: Error parameters for Set:0 Msg:0:
Class: qqsp_DistAccessException
Detected at: qqcm_HoseFSM::ReceivedClose at 1
Error Time: Thu May 24 09:16:47
regards,
Suma Venkatesh -
Multiple Standby Databases on same RAC nodes.
We have a 3 node Oracle 10gR2 RAC production environment on site A and a 3 node Oracle 10g RAC standby environment on site B. Both use like HW and OS - HP BL45p with RHEL AS 4.x.
Can we have heterogeneous standby databases(logical and physical) running on the same RAC nodes?
Can the 2 apply processes (MRP-manageed recovery process & LSP-logical standby process) coexists on the same set of nodes in a cluster at the same time. Are there any conflicts or limitations?
Is there any documentation that supports this?Would Active Data Guard give you the best of both worlds?
The caveat might be your SID_LIST_LISTENER setup.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(PROGRAM = extproc)
(SID_DESC =
(global_dbname = <database1>_DGMGRL.yourdomain)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(sid_name = <database1>)
(SID_DESC =
(global_dbname = <database2>_DGMGRL.yourdomain)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0)
(sid_name = <database2>)
I have a server with ten standbys on it currently and no issues.
As long as the version is the same you should be good. -
Migration from Single Node to RAC Node
Hi,
We are planning to migrate the Database from Single-Node to RAC Node. Are there any checklist list to be considered regarding Performance, System and Database ( or any other topic), before migrating to RAC.
ThanksIf it is already the same version, you will need to focus any "single-threadedness" of the application like sequences that are mandatory sequential, "semaphore-style" locking etc...
My experience is that going RAC and in particular to ASM can give a performance boost. If it is the same version (11gR2 for example) then any statistics-related performance issues will still be there. -
Huge number of idle connections from loopback ip on oracle RAC node
Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thanksuser12959884 wrote:
Hi,
We have a 2node 11gR2(11.2.0.3) oracle RAC node. We are seeing huge number of idle connection(more than 5000 in each node) on both the nodes and increasing day by day. All the idle connections are from VIP and loopback address(127.0.0.1.47971 )
netstat -an |grep -i idle|more
127.0.0.1.47971 Idle
any insight will be helpful.
The server is suffering memory issues occasionally (once in a month).
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
Thankswe can not control what occurs on your DB Server.
How do I ask a question on the forums?
SQL and PL/SQL FAQ
post results from following SQL
SELECT * FROM V$VERSION; -
RAC node connected to outside DB and pass 2 IP address
Experts,
we have a 4 nodes 11.1 RAC at red hat
As we know each node have 3 IP. --public, vip and privated IP.
it works well in domain inside network.
But we get a problem when try to connect to outside network client's database.
the connection string pass 2 IPs to client firewall (based on network monitor).
listener log show that connection is OK. But Conection is still blocked by client's firewall side.
The client network staff told us that we passed two IP address during connected connection.
Could some experts explain why does the RAC node's connected requested passs two IP to client database?
It is only discovered by network staff. we could not see 2 IP information in listener log file.
Is it our firewall NAT setting issue? or client firewall NAT setting issue
Thanks
Jim
Edited by: user589812 on Jan 21, 2010 2:25 PMHi Experts
The Two IP addresses that were being passed were one of the load balancer and one of the db server. the load balancer was supposed to mask the load balancer IP address and only pass the db IP address. Somehow, we were sending both IP to client database--outside network. But IT works well in inter network side. How to eliminate the load balancer IP address from coming to client network firewall --to client database server side?
I looking for help!
JIm -
What would happened when one RAC node's public NIC down ?
Dear all,
There's a two-node RAC on my office. As my observation, when one RAC node's public NIC is down, the "crs_stat -t" command wouldn't show any resource is OFFLINE. So at this moment if i use sqlplus to connect to my RAC db, it still will choose node1 or node2 instance randomly right? And if i've been assigned to the node's instance whose public NIC is down, my sqlplus would failed to connect to db?
So, can we say that Oracle RAC can't supply HA function if one node's public NIC is down? Or is there just any other solution to solve this issue? Any suggestion would be appreciated.Public node is down means that only one node would be able to take the load i.e. in your case it will be Node 2. It may happen that CRS is unable to record the status - in such case there may be failed connections to one node.
-
I have a pre-production 2-node cluster running on Solaris 10, Oracle 10.2.0.3 with the Oracle CRS, and using a NetApp filer as the shared storage.
I also have a separate Solaris server running Grid Control 10.2.0.3, with the repository as one of the databases on the RAC (don't know if this is relevant to my problem).
Periodically both RAC nodes reboot, with no trace of why (the GC server is fine). There is nothing logged in the Solaris logs (messages file), CRS logs, Oracle logs or the NetApp logs.
All that is shown is the relevant service starting up following the shutdown.
Has anyone any experience of this, or any thoughts on which component may cause such an issue?
Thanks in advance
BobWhat type of Sun hardware are you using?
Below is the Action Plan Oracle support sent me on my SR on this issue, not sure if any of this was provided to you or would be of help.
ACTION PLAN
============
1. there is nothing on the files at all that sheds any light on the issue
agian 3 sperate sets of clusters all losing all nodes at the same tiem is a very strange occurance. Please be sure to have the admin look for
anything in common wiht all custers.
2. advice placing oswatcher on the systems Note.301137.1 Ext/Pub OS Watcher User Guide
if we should have another occurances we will want the oswatcher logs for 1 hr before issue thru issue
also see if the unix admin perhaps has any os stats from this occurance
3. advice settign ntpd to run with -x option I do see that you are having negative time changes
at times
-x will give us a skew rather then an abbrupt time change
4. advice setting this when you can
Please do the following
set the diagwait parameter:
crsctl set css diagwait N [-force]
Where N is the number of seconds to wait for a filesystem sync to
complete (after this wait the node will reboot regardless of whether the
sync has completed). This change must be made with the clusterware
down, which will require the '-force', or with the stack up on just 1
node, after which the stack on that node must be restarted before the
stack starts up on any of the other nodes.
N should be set to 25 (25 seconds)
5. advice that you have with pcw mlr#6 Patch 5980915 on the systems as well
but I do not believe that this was an oracle bug the reason for placing the patch on is for advanced diagnostics that is in that patchset
6. the two issues sun is workking on
Sun is working to resolve a time skew issue and a Solaris 10 kernel SIGALRM Sun#6292092 in addition to Sun#6595936.
7. we do have a diagnostic oprocd that soem sites have used but on thier test systems. It stops reboots adn dumps information but I have
been hesitant to place it on production boxes if you continue to have issues we may consider download the oprocd_skewfix_noreboot fro
m Bug 6279879 but at this time I do not belvve that is warrented -
Rac node failed how do you bring it back up?
Example: If there are 3 RAC nodes and one becomes unavailable/fails, how do you bring it back up?
There are typically two basic reasons why a RAC node will go down.
A cluster issue, causing the node to be evicted. This usually means the node lost access to the cluster storage (e.g. no access to voting disks), or lost access to the Interconnect.
An o/s issue causing the node to fail. E.g. a kernel panic due to a page swap and memory not syncing, or a soft CPU lockup, etc.
You need to determine why it went down, and determine what is needed to enable it to join the successfully cluster again. -
Hi experts,
We are prototyping RAC database upgrade from 10.1.0 to 10.2.0.4. I just created a new directory as 10.2.0 ORACLE_HOME. Since the 10gr1 database is running, all the environmental variables set by login profile point to the current 10.1.0 database configuration. Need advice on a few of questions:
1) At this point, I only want to use OUI to install 10gr2 binaries, and 10.2.0.4 binaries. Thought I could just kick off runInstaller to get it done. However, after reading Oracle documentation and a few articles, I am not sure if I should set any environmental variables.
One article recommends
. set ORACLE_BASE=/u01/app/oracle
. set ORACLE_SID=orcl1 # Each RAC node must have a unique Oracle SID!
. set LD_LIBRARY_PATH=$ORACLE_HOME/lib
. unset ORACLE_HOME
Others:
. set ORACLE_HOME
. set PATH
. set LD_LIBRARY_PATH
. set ORA_CRS_HOME
. unset TNS_ADMIN
I know I need to have all of them set properly prior to 10gr2 upgrade. Do I need to set any of them at all just to install 10gr2 binaries and patchset via OUI? If so, which ones?
2) There are a few 10.2.0.4 patches needed to be opatch applied to each cluster node.
I noticed some of the post processing involves sql executions via sqlplus.
My guess is to withhold these sql command steps temporarily, and execute them after upgrade to 10.2.0.4 completes. Is this the correct interpretation of the patch application?
Thanks, NewbieThere is a CRS PSU for CRS 10.2
Patch# 8705958 - 10.2.0.4.2 for CRS PSU 2
Should the PSU should be applied to CRSprior to database upgrade from 10.1 to 10.2? Can it be applied after the upgrade? What is the general guideline on this?
Thanks. -
How to execute DBMS_JOB at exactly one RAC node
Hello,
after unsuccessfully searching for "RAC" and "DBMS_JOB" I open this thread.
Can you tell me how to dedicate one RAC-node for doing my batch-jobs
which are started by using dbms_job (so there is no tnsnames.ora which is used).
Thanks in advancehi,
Let's say the instances are named:I1, I2, I3, I4
Issue:ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I1';
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I2';
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0 SCOPE=BOTH SID='I3';
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=10 SCOPE=BOTH SID='I4';So that only instance I4 will run jobs.
Regards,
Yoann.
Maybe you are looking for
-
Hi...Is there a trick to getting the images to look crisp and clear? All my uploaded images look pixelated and I have tried everything from resizing myself to exact dimensions, to uploading as jpegs and pngs. Nothing works. I have the new version
-
How to use to use rank over() function in block coding
Hi, I am having problem with using rank () over function in block coding. I can't use it in declaration section with select statement. How to use in executable section of pl sql ? --Sujan
-
Does anyone know anything about Tomcat security policy?
I already searched the forum, but no help:( Any expert out there that can help me a little on Catalina.policy file? Thanks Please see this thread: http://forum.java.sun.com/thread.jsp?forum=33&thread=510173
-
I have the preferences set to "Show my windows and tabs from the last time". I used "Use Current Pages" button. I have changed the used the "|" and I have changed the "|" to %7C that I read in one of the support articles. If Firefox is shutdown and r
-
I have a question about multiple users with Photoshop CC/Lightroom CC.
With a single subscription to the Photography bundle of Photoshop and Lightroom CC - how many simultaneous users are permitted? For example, if I have Photoshop running on my desktop, can someone else use the version that is installed on my laptop?