Row cache wait in Oracle RAC

Hi,
Our application is running on Oracle RAC. During certain time of the day, the applcation responds very slowly. At these times, it is observed that the row cache waits are very high. We have even tried altering the sys.AUDSES$ sequence and changing it cache size to 10000 from default 20, but this did not help.
Can anyone suggest a solution for this problem? And why this problem occurs?

Hi,
it looks like your problem is related to the fact that you do not cache sequences (this is a well know RAC tuning topic).
Oracle introduced sequences (wrong name, definitely) to generate unique numbers, not to actually support a time sequence of events, or to preserve an order or to have ascending sequences of numbers with no gaps.
Ordering a sequence of events is a serialization process that should not be implemented by a sequence.
Now, if you do not cash sequences, in RAC the lock (enqueue) on the sequence (that is required when you ask for the next set of values) is a global resource on which inter instance contention occurs.
Furthermore, in case the application has a high volume of inserts, having nocache sequences leads to inter instance index block contention.
Oracle says that the default cache value of 20 for sequences is inappropriate in most case of RAC implementations and it is frequent to have caches of 1000 values or more. You need to test what is your ideal value.
Now it is up to you to decide between:
- keep things as they are and have a non scalable RAC installation
- find a way to cache sequences without harming the application assumptions.
Hope it helps,
Regards,
Corrado

Similar Messages

Enable Cache Fusion in Oracle RAC

Hi gurus,
I cannot find on google how to enable and test Cache Fusion feautre in Oracle RAC. Could you help me please?
Best.

I don't know. I cannot find parameter which enable or disable CFAs Aman has already stated the feature is already present
http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/consist.htm#CNCPT1317
I even cannot find information how to test CF to ensure that it really works?!http://download.oracle.com/docs/cd/B28359_01/rac.111/b28254/monitor.htm#RACAD981
@Aman
Cache Fusion is the technology which makes the 10g RAC, 10g RAC. And 11g Database as well :)
Edited by: Amy De Caj on Jul 19, 2009 4:26 AM
Edited by: Amy De Caj on Jul 19, 2009 4:28 AM

Performance issues; waited too long for a row cache enqueue lock!

hi Experts,
OS: Oracle Solaris on SPARC (64-bit)
DB version:
SQL> select * from V$VERSION;
BANNER
Oracle Database 11g Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Solaris: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL>We have seen 100% CPU usage and high database load, so I checked the instance and have seen there were many blocking sessions and more than 71 sessions running the same select ;
elect tablespace_name as tbsname from (select tablespace_name,sum(bytes)/1024/1024 free_mb,0 total_mb,0 max_mb from dba_free_space group by tablespace_name union select tablespace_name, 0 current_mb,sum(bytes)/1024/1024 total_mb, sum(decode(maxbytes, 0, bytes, maxbytes))/1024/1024 max_mb from dba_data_files group by tablespace_name) group by tablespace_name having round((sum(total_mb)-sum(free_mb))/sum(max_mb)*100) > 95 Blocking sessions are running queries like this;
SELECT * from MYTABLE WHERE MYCOL=:1 FOR UPDATE;This select queries are coming from a cron job running every 10 minutes to check the tablespaces; so I first killed (kill -9 pid) those select statements so the load and CPU decreased to 13% of CPU usage. Blocking sessions still there and I didn't killed them waiting for app guys confirmation... after few hours and the CPU usage never went down the 13%; I have seen many errors;
WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=...System State dumped to trace file .....trcAfter that , we decided to restart the DB to release the locks!
I would like to understand why during loads we were no able to run those select statements, statspack schedule snapshot reports were not able to finish, also automatic
database statistics... why 5 for update statements locked the whole DB?

user12035575 wrote:
SELECT FOR UPDATE will only lock the table row until the transaction is completed.
"WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK" happens when it needs to acquire a lock on data dictionary. Did you check the trace file associated with the statement?The trace file is too long, which information I need to focus more?

Error: WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=26

Hi every one,
Today, i met a problem: Application cannot connect to database because database hang ( I also cannot connect to database with sqlplus) . Check alert log, only one error:
WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=26This error not only appear first time, but also happen every one month. I must reset server for database release all memory but I think It isn't a true solution!
Could you give a recommend for this.
Regards.

The Row Cache is actually the Data Dictionary Cache. It is where definitions from the data dictionary (tablespaces, objects, users etc) are loaded into memory.
There would be an associated trace file written with the occurrence of this warning.
See Oracle Support Note 278316.1 for more information
Hemant K Chitale

Latch: row cache objects

Hello everyone,
Note: Apologize for the bad formatting, tried but it seems I forgot how to use it
BANNER
Oracle Database 11g Release 11.2.0.2.0 - 64bit Production
I've seen high "*latch: row cache objects*" in SP/ASH report for ~14 hours back, when the users were unable to connect to the database. There were,
WARNING: inbound connection timed out (ORA-3136)
Time: 30-APR-2012 02:24:36
Tracing not turned on.
Tns error struct:
errors all over the alert log for the duration of 6 minutes of the problem.
I've put few records in bold due to which I concluded that the problem was with "dc_users" thing.
Can anybody tell me how/where I should proceed forward ?
SP report:Instance Efficiency Indicators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.84 Optimal W/A Exec %: 100.00
Library Hit %: 97.43 Soft Parse %: 87.86
Execute to Parse %: 22.54 Latch Hit %: 99.95
Parse CPU to Parse Elapsd %: 0.30 % Non-Parse CPU: 87.83
Shared Pool Statistics Begin End
Memory Usage %: 45.09 46.98
% SQL with executions>1: 11.49 13.15
% Memory for SQL w/exec>1: 72.96 21.33
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
latch: row cache objects 6,655 634,260 95306 97.0
log file sync 289,923 6,469 22 1.0
CPU time 5,039 .8
db file sequential read 310,084 2,840 9 .4
log file parallel write 451,706 1,144 3 .2
ASH Report
Analysis Begin Time: 30-Apr-12 02:24:00
Analysis End Time: 30-Apr-12 02:30:00
Elapsed Time: 6.0 (mins)
Begin Data Source: DBA_HIST_ACTIVE_SESS_HISTORY
in AWR snapshot 12185
End Data Source: DBA_HIST_ACTIVE_SESS_HISTORY
in AWR snapshot 12185
Sample Count: 1,385
Average Active Sessions: 38.47
Avg. Active Session per CPU: 1.60
Report Target: None specified
Top User Events DB/Inst: NIKU/niku (Apr 30 02:24 to 02:30)
Avg Active
Event Event Class % Event Sessions
latch: row cache objects Concurrency 75.45 29.03
CPU + Wait for CPU CPU 9.75 3.75
log file sync Commit 3.83 1.47
db file sequential read User I/O 3.61 1.39
Top Event P1/P2/P3 Values DB/Inst: NIKU/niku (Apr 30 02:24 to 02:30)
Event % Event P1 Value, P2 Value, P3 Value % Activity
Parameter 1 Parameter 2 Parameter 3
latch: row cache objects         75.60       "42287858200","279","0"      75.60
address number tries
1* select addr, latch#, child#, name, misses, gets from v$latch_children where name like '%row%cache%objec%' order by gets , misses
niku> /
ADDR                 LATCH#     CHILD# NAME                                                   MISSES       GETS
0000000A16FF21C8        279         26 row cache objects                                           0          0
0000000A16FF14C8        279          2 row cache objects                                           0          0
00000009D88D7ED8        279          3 row cache objects                                           0          0
0000000A16FF1B48        279         14 row cache objects                                           0          0
00000009D88D8558        279         15 row cache objects                                           0          0
0000000A16FF1CE8        279         17 row cache objects                                           0          0
0000000A26265A28        279         19 row cache objects                                           0          0
0000000A16FF1E88        279         20 row cache objects                                           0          0
00000009D88D8898        279         21 row cache objects                                           0          0
0000000A26265BC8        279         22 row cache objects                                           0          0
0000000A16FF2028        279         23 row cache objects                                           0          0
00000009D88D8A38        279         24 row cache objects                                           0          0
0000000A26265D68        279         25 row cache objects                                           0          0
00000009D88D8BD8        279         27 row cache objects                                           0          0
0000000A26265F08        279         28 row cache objects                                           0          0
00000009D88D8D78        279         30 row cache objects                                           0          0
0000000A262660A8        279         31 row cache objects                                           0          0
0000000A16FF2508        279         32 row cache objects                                           0          0
0000000A16FF26A8        279         35 row cache objects                                           0          0
00000009D88D90B8        279         36 row cache objects                                           0          0
0000000A262663E8        279         37 row cache objects                                           0          0
0000000A262668C8        279         46 row cache objects                                           0          0
0000000A26266A68        279         49 row cache objects                                           0          0
0000000A16FF2368        279         29 row cache objects                                           0         11
0000000A16FF2848        279         38 row cache objects                                           0        116
0000000A16FF29E8        279         41 row cache objects                                           0        200
00000009D88D93F8        279         42 row cache objects                                           0        318
00000009D88D9258        279         39 row cache objects                                           0       1010
0000000A16FF2EC8        279         50 row cache objects                                           0       1406
00000009D88D9598        279         45 row cache objects                                           0       1472
0000000A26266588        279         40 row cache objects                                           0       1705
0000000A26266728        279         43 row cache objects                                           0       7383
0000000A16FF2B88        279         44 row cache objects                                           0      32346
00000009D88D98D8        279         51 row cache objects                                          19      63948
0000000A26265888        279         16 row cache objects                                           0      88045
0000000A26266248        279         34 row cache objects                                           0     141176
00000009D88D9738        279         48 row cache objects                                           0     326672
0000000A16FF19A8        279         11 row cache objects                                         867    1770385
00000009D88D8078        279          6 row cache objects                                           9    1979542
0000000A16FF2D28        279         47 row cache objects                                           2    3435018
00000009D88D86F8        279         18 row cache objects                                        2557   14956121
0000000A26265068        279          1 row cache objects                                         224   24335868
0000000A262653A8        279          7 row cache objects                                       29760 133991553
00000009D88D8F18        279         33 row cache objects                                       60612 677263122
00000009D88D83B8        279         12 row cache objects                                       23981 739014460
0000000A26265208        279          4 row cache objects                                    19973399 852043775
0000000A26265548        279         10 row cache objects                                      280137 856097342
00000009D88D8218        279          9 row cache objects                                   715879777 1219000976
0000000A262656E8        279         13 row cache objects                                     3856073 2397402780
0000000A16FF1668        279          5 row cache objects                                    12763217 2920278217
*0000000A16FF1808        279          8 row cache objects                                    67329804 4145389092*
51 rows selected.
niku> list
1 select addr, latch#, child#, name, misses, gets from v$latch_children where name like '%row%cache%objec%' order by gets , misses
niku> select distinct s.kqrstcln latch#,r.cache#,r.parameter name,r.type,r.subordinate#
from v$rowcache r,x$kqrst s
where r.cache#=s.kqrstcid
order by 1,4,5; 2    3    4
    LATCH#     CACHE# NAME                                               TYPE        SUBORDINATE#
         1          3 dc_rollback_segments                               PARENT
         2          1 dc_free_extents                                    PARENT
         3          4 dc_used_extents                                    PARENT
         4          2 dc_segments                                        PARENT
         5          0 dc_tablespaces                                     PARENT
         6          5 dc_tablespace_quotas                               PARENT
         7          6 dc_files                                           PARENT
         *8         10 dc_users                                           PARENT*
         *8          7 dc_users                                           SUBORDINATE            0*
         *8          7 dc_users                                           SUBORDINATE            1*
         *8          7 dc_users                                           SUBORDINATE            2*
         9          8 dc_objects                                         PARENT
         9          8 dc_object_grants                                   SUBORDINATE            0
        10         17 dc_global_oids                                     PARENT
        11         12 dc_constraints                                     PARENT
        12         13 dc_sequences                                       PARENT
        13         16 dc_histogram_defs                                  PARENT
        13         16 dc_histogram_data                                  SUBORDINATE            0
        13         16 dc_histogram_data                                  SUBORDINATE            1
        14         54 dc_sql_prs_errors                                  PARENT
        15         32 kqlsubheap_object                                  PARENT
        16         19 dc_table_scns                                      PARENT
        16         19 dc_partition_scns                                  SUBORDINATE            0
        17         18 dc_outlines                                        PARENT
        18         14 dc_profiles                                        PARENT
        19         47 realm cache                                        PARENT
        19         47 realm auth                                         SUBORDINATE            0
        20         48 Command rule cache                                 PARENT
        21         49 Realm Object cache                                 PARENT
        21         49 Realm Subordinate Cache                            SUBORDINATE            0
        22         46 Rule Set Cache                                     PARENT
        23         34 extensible security user and rol                   PARENT
        24         35 extensible security principal pa                   PARENT
        25         37 extensible security UID to princ                   PARENT
        26         36 extensible security principal na                   PARENT
        27         33 extensible security principal ne                   PARENT
        28         38 XS security class privilege                        PARENT
        29         39 extensible security midtier cach                   PARENT
        30         43 AV row cache 1                                     PARENT
        31         44 AV row cache 2                                     PARENT
        32         45 AV row cache 3                                     PARENT
        33         15 global database name                               PARENT
        34         20 rule_info                                          PARENT
        35         21 rule_or_piece                                      PARENT
        35         21 rule_fast_operators                                SUBORDINATE            0
        36         23 dc_qmc_ldap_cache_entries                          PARENT
        37         52 qmc_app_cache_entries                              PARENT
        38         53 qmc_app_cache_entries                              PARENT
        39         27 qmtmrcin_cache_entries                             PARENT
        40         28 qmtmrctn_cache_entries                             PARENT
        41         29 qmtmrcip_cache_entries                             PARENT
        42         30 qmtmrctp_cache_entries                             PARENT
        43         31 qmtmrciq_cache_entries                             PARENT
        44         26 qmtmrctq_cache_entries                             PARENT
        45          9 qmrc_cache_entries                                 PARENT
        46         50 qmemod_cache_entries                               PARENT
        47         24 outstanding_alerts                                 PARENT
        48         22 dc_awr_control                                     PARENT
        49         25 SMO rowcache                                       PARENT
        50         40 sch_lj_objs                                        PARENT
        51         41 sch_lj_oids                                        PARENT
61 rows selected.
niku> select parameter, gets from v$rowcache order by gets desc;
PARAMETER                              GETS
dc_users                         2802019571
dc_tablespaces                   2405092307
dc_objects                       1815427326

jjk wrote:
I've already been thru the link that you've mentioned and unfortunately couldn't make much use of it.I didn't think it was really likely to be relevant, but there was always a long shot that it might have given you a clue.
Considering the "dc_users" had maximum gets, I thought (rather as per internet) that it might be the point of contention. However I did observe high misses on child# 9 which is "dc_objects". It's often the case that the misses is more important than the gets when you see lots of gets and misses on a few latches/caches - the bit that might have been most instructure was the dictionary cache bit from the AWR showing gets, misses, scans, scanmisses etc. It might have told us a little about what was going in and out of the dictionary cache and let us guess why.
In alert log:
Sun Apr 29 02:20:00 2012
29-APR-2012 02:20:00 -- xxxxxxx package - REGRANT_READONLY Begin re-grant read only roles
Sun Apr 29 02:24:34 2012
29-APR-2012 02:24:34 -- xxxxxxx package - REGRANT_READONLY End re-grant read only roles
Sun Apr 29 02:30:00 2012
29-APR-2012 02:30:00 -- xxxxxxx package - REGRANT_READWRITE Begin re-grant read write roles
Sun Apr 29 02:32:02 2012
29-APR-2012 02:32:02 -- xxxxxxx package - REGRANT_READWRITE End re-grant read write roles
Is this code that "regrants" roles to users who already have them ? That's what it sounds like, and that sounds like something that would impact on various parts of the dictionary cache, especially dc_users, and possibly dc_obejcts.
CPU per    Elap per     Old
Executions   Rows Processed   Rows per Exec    Exec (s)   Exec (s) Hash Value
161,198           1,244              0.0       0.00        0.00   978935325
select /*+ rule */ c.name, u.name from con$ c, cdef$ cd, user$ u
where c.con# = cd.con# and cd.enabled = :1 and c.owner# = u.us
er#
159,955         159,952              1.0       0.00        0.00 2458412332
select o.name, u.name from obj$ o, user$ u where o.obj# = :1 an
d o.owner# = u.user#
159,932               6              0.0       0.00        0.00 2636710067
insert into objauth$(option$,grantor#,obj#,privilege#,grantee#,c
ol#,sequence#) values(decode(:1,0,null,:1),:2,:3,:4,:5,decode(:6
,0,null,:6),object_grant.nextval)
147,168         147,168              1.0       0.00        0.00 3468666020
select text from view$ where rowid=:1
124,635         124,635              1.0       0.00        0.00   564166580
select count(*) from (                                 select u.
name                                        from registry$ r, us
er$ u                            where r.status in (1,3,5)
and r.namespace = 'SERVER'The first one looks like a response to a constraint being breached.
The third one looks like something that might happen when you grant a privilege on an object to a user - and maybe the first one happens if the user has already got it and the insert raises a "duplicate key" error. The fourth one commonly happens when you have to re-optimize a query containing a view - and when you execute DDL (such as changing privileges on an object) you invalidate SQL and have to re-optimize it eventually. I can't remember where I've seen the second one appearing.
If you have a process that tries to do a lot of grants on objects to users and roles in a very short time, it's quite likely to create havoc in the dictionary cache - check what that package was up to and why it runs.
What is the missing information ?When I looked at some of your posting, the output didn't match the query, some of the later columns had gone missing - this might have been my browser rather than your input though.
Regards
Jonathan Lewis

"latch: row cache objects" and high "VERSION_COUNT"

Hello,
we are being faced with a situation where the database spends most of it's time waiting for latches in the shared pool (as seen in the AWR report).
All statements issued by the application are using bind variables, but what we can see in V$SQL is that even though the statements are using bind variables some of them have a relatively high version_count (> 300) and many invaliadations (100 - 200) even though the tables involved are very small (some not more than 3 or 4 rows).
Here is some (hopefully enough) information about the environment
Version: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production (on RedHat EL 5)
Parameters:
cursor_bind_capture_destination       memory+disk
cursor_sharing                        EXACT
cursor_space_for_time                 FALSE
filesystemio_options                  none
hi_shared_memory_address              0
memory_max_target                     12288M
memory_target                         12288M
object_cache_optimal_size             102400
open_cursors                          300
optimizer_capture_sql_plan_baselines FALSE
optimizer_dynamic_sampling            2
optimizer_features_enable             11.2.0.2
optimizer_index_caching               0
optimizer_index_cost_adj              100
optimizer_mode                        ALL_ROWS
optimizer_secure_view_merging         TRUE
optimizer_use_invisible_indexes       FALSE
optimizer_use_pending_statistics      FALSE
optimizer_use_sql_plan_baselines      TRUE
plsql_optimize_level                  2
session_cached_cursors                50
shared_memory_address                 0The shared pool size (according to AWR) is 4,832M
The buffer cache is 3,008M
Now, my question: is a version_count of > 300 a problem (we have about 10-15 of those with a total of ~7000 statements in v$sqlarea). Those are also the statements listed in the AWR report at the top in the section "SQL ordered by Version Count" and "SQL ordered by Sharable Memory"
Is it possible that those statements are causing the the latch contention in the shared pool?
I went through https://blogs.oracle.com/optimizer/entry/why_are_there_more_cursors_in_11g_for_my_query_containing_bind_variables_1
The tables involved are fairly small and all the execution plans for each cursor are identical.
I can understand some of the invalidations that happen, because we have 7 schemas that have identical tables, but from my understanding that shouldn't cause such a high invalidation number. Or am I mistaken?
I'm not that experienced with Oracle tuning at that level, so I would appreciate any pointer on how I can find out where exactly the latch problem occurs
After flushing the shared pool, the problem seems to go away for a while. But apparently that is only fighting symptoms, not fixing the root cause of the problem.
Some of the statements in question:
SELECT * FROM QRTZ_SIMPLE_TRIGGERS WHERE TRIGGER_NAME = :1 AND TRIGGER_GROUP = :2
UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = :1 WHERE TRIGGER_NAME = :2 AND TRIGGER_GROUP = :3 AND TRIGGER_STATE = :4
UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = :1 WHERE JOB_NAME = :2 AND JOB_GROUP = :3 AND TRIGGER_STATE = :4
SELECT TRIGGER_STATE FROM QRTZ_TRIGGERS WHERE TRIGGER_NAME = :1 AND TRIGGER_GROUP = :2
UPDATE QRTZ_SIMPLE_TRIGGERS SET REPEAT_COUNT = :1, REPEAT_INTERVAL = :2, TIMES_TRIGGERED = :3 WHERE TRIGGER_NAME = :4 AND TRIGGER_GROUP = :5
DELETE FROM QRTZ_TRIGGER_LISTENERS WHERE TRIGGER_NAME = :1 AND TRIGGER_GROUP = :2So all of them are using bind variables.
I have seen that the columns used in the where clause all have histograms available. Would removing them reduce the number of invalidations?
Unfortunately I did not save the information from v$sql_shared_cursor before the shared pool was flushed, but most of the invalidations occurred in the ROLL_INVALID_MISMATCH column if that is of any help. There are some invalidations reported for AUTH_CHECK_MISMATCH and TRANSLATION_MISMATCH but to my understanding they caused by executing the statement for different schemas if I'm not mistaken.
Looking at v$latch_missed, most of the waits for parent = 'row cache objects' are for "kqrpre: find obj" and "kqreqd: reget"

>
In the AWR report, what does the Dictionary Cache Stats section say?
>
Here they are:
Dictionary Cache Stats
Cache                 Get Requests      Pct Miss     Scan Reqs    Mod Reqs      Final Usage
dc_awr_control        65                0.00         0            2             1
dc_constraints        729               33.33        0            729           1
dc_global_oids        60                23.33        0            0             31
dc_histogram_data     7,397             10.53        0            0             2,514
dc_histogram_defs     21,797            9.83         0            0             5,239
dc_object_grants      4                 25.00        0            0             12
dc_objects            27,683            2.29         0            223           2,581
dc_profiles           1,842             0.00         0            0             1
dc_rollback_segments 1,634             0.00         0            0             39
dc_segments           7,335             6.94         0            360           1,679
dc_sequences          139               5.76         0            139           19
dc_table_scns         53                100.00       0            0             0
dc_tablespace_quotas 1,956             0.10         0            0             4
dc_tablespaces        17,488            0.00         0            0             11
dc_users              58,013            0.03         0            0             164
global database name 4,261             0.00         0            0             1
outstanding_alerts    54                0.00         0            0             9
sch_lj_oids           4                 0.00         0            0             2
Library Cache Activity
Namespace             Get Requests     Pct Miss     Pin Requests          Pct Miss      Reloads   Invalidations
ACCOUNT_STATUS        3,664            0.03         0                                   0         0
BODY                  560              2.14         2,343                 0.60          0         0
CLUSTER               52               0.00         52                    0.00          0         0
DBLINK                3,668            0.00         0                                   0         0
EDITION               1,857            0.00         3,697                 0.00          0         0
INDEX                 99               19.19        99                    19.19         0         0
OBJECT ID             68               100.00       0                                   0         0
SCHEMA                2,646            0.00         0                                   0         0
SQL AREA              32,996           2.26         1,142,497             0.21          189       226
SQL AREA BUILD        848              62.15        0                                   0         0
SQL AREA STATS        860              82.09        860                   82.09         0         0
TABLE/PROCEDURE       17,713           2.62         26,112                4.88          61        0
TRIGGER               1,704            2.00         6,737                 0.52          1         0

ROW CACHE ENQUEUE LOCK/ibrary cache load lock leads to database hung

(lowercase, curly brackets, no spaces)
We faced database hung on 3 node 11i erp 9i rac database.
We saw the library cache load lock timed out events reported in alert log.
Then few ora-600 and later ROW CACHE ENQUEUE LOCK timed out event. Eventually database was hung and we had to bounce the services .
we created support sr 7845542.992 for RCA.
The support says to increase shared pool size to avoid shared pool fragmentation and avoid reload ,additionaly to upgrade to 10g database.
I am not covinced adding additional pool size would solve this or upgrade to 10 .furthermore even 10g has such issues reported.
I saw couple of bugs mentioned such issue can happen due deadlock of session holding latches .
kindly let me know your view on issue
If required i can attach statspack for more information. (lowercase, curly brackets, no spaces)

Many Thanks, i was keen to have your update .
There are 8 cpus on each node . Reloads very high during time period ,but normally there are not high reloads.
Statspack details for 3 nodes
STATSPACK report for
DB Name         DB Id    Instance     Inst Num Release     Cluster Host
PROD            21184234 PROD1               1 9.2.0.8.0   YES     npi-or-db-p-
                                                                   11.npi.corp
              Snap Id     Snap Time      Sessions Curs/Sess Comment
Begin Snap:    149817 30-Oct-09 13:00:09      574 #########
End Snap:    149837 30-Oct-09 14:00:17      602 #########
   Elapsed:               60.13 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     8,192M      Std Block Size:          8K
           Shared Pool Size:     1,024M          Log Buffer:     10,240K
Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                  Redo size:            122,414.93             11,449.13
              Logical reads:             69,550.76              6,504.89
              Block changes:                928.41                 86.83
             Physical reads:                196.24                 18.35
            Physical writes:                 28.65                  2.68
                 User calls:                343.97                 32.17
                     Parses:                558.61                 52.25
                Hard parses:                 43.48                  4.07
                      Sorts:                467.24                 43.70
                     Logons:                  0.63                  0.06
                   Executes:              2,046.99                191.45
               Transactions:                 10.69
% Blocks changed per Read:    1.33    Recursive Call %:     97.59
Rollback per transaction %:    5.07       Rows per Sort:     15.85
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %: 100.00       Redo NoWait %:    100.00
            Buffer Hit   %:   99.72    In-memory Sort %:    100.00
            Library Hit   %:   96.79        Soft Parse %:     92.22
         Execute to Parse %:   72.71         Latch Hit %:     99.77
Parse CPU to Parse Elapsd %:   60.10     % Non-Parse CPU:     78.07
-> s - second
-> cs - centisecond -     100th of a second
-> ms - millisecond -    1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
db file sequential read           249,234          0      1,537      6      6.5
db file scattered read             61,776          0        769     12      1.6
row cache lock                    780,098         10        566      1     20.2
library cache lock                697,849        157        432      1     18.1
latch free                        127,926      4,715        387      3      3.3
global cache cr request           370,770      3,091        309      1      9.6
PL/SQL lock timer                      59         58        112   1903      0.0
wait for scn from all nodes       303,572         18        103      0      7.9
library cache pin                  26,231          2        100      4      0.7
global cache null to x             17,717        716         92      5      0.5
buffer busy waits                   5,388         18         74     14      0.1
db file parallel read               5,245          0         69     13      0.1
log file sync                      20,407         29         66      3      0.5
enqueue                            52,200         70         60      1      1.4
buffer busy global CR               4,845         33         55     11      0.1
CGS wait for IPC msg              412,512    407,106         50      0     10.7
ksxr poll remote instances      1,279,565    483,046         48      0     33.2
log file parallel write           160,040          0         42      0      4.1
library cache load lock             1,491          2         29     20      0.0
global cache open x                19,507        344         28      1      0.5
buffer busy global cache              957          0         22     23      0.0
global cache s to x                16,516        180         20      1      0.4
db file parallel write             11,120          0         12      1      0.3
log file sequential read              618          0         11     18      0.0
DFS lock handle                    23,768          0         10      0      0.6
control file sequential read        8,563          0          4      0      0.2
KJC: Wait for msg sends to c        1,549         57          4      3      0.0
lock escalate retry                    76         76          4     52      0.0
SQL*Net break/reset to clien       12,546          0          3      0      0.3
SQL*Net more data to client        85,773          0          3      0      2.2
control file parallel write         1,265          0          2      1      0.0
global cache null to s                648         23          1      2      0.0
global cache busy                     200          0          1      5      0.0
global cache open s                 1,493         28          1      1      0.0
log file switch completion             12          0          1     61      0.0
PX Deq Credit: send blkd              161         70          1      4      0.0
kksfbc child completion               119        118          1      5      0.0
PX Deq: reap credit                 5,948      5,456          0      0      0.2
PX Deq: Execute Reply                  83         29          0      3      0.0
process startup                         8          0          0     25      0.0
LGWR wait for redo copy               992         12          0      0      0.0
IPC send completion sync              450        450          0      0      0.0
PX Deq: Parse Reply                   100         28          0      1      0.0
undo segment extension             10,380     10,372          0      0      0.3
PX Deq: Join ACK                      146         65          0      1      0.0
buffer deadlock                       222        221          0      0      0.0
async disk IO                       1,179          0          0      0      0.0
wait list latch free                    2          0          0     16      0.0
PX Deq: Msg Fragment                  112         28          0      0      0.0
Library Cache Activity for DB: PROD Instance: PROD1 Snaps: 149817 -149837
->"Pct Misses" should be very low
                         Get Pct        Pin        Pct               Invali-
Namespace           Requests Miss     Requests     Miss     Reloads dations
BODY                 116,007    1.1        133,347   19.9     24,338        0
CLUSTER                4,224    0.6          5,131    1.0          0        0
INDEX                 15,048   24.1         13,798   26.4          2        0
JAVA DATA                 82    0.0            692   39.6        136        0
JAVA RESOURCE             66   39.4            206   25.2         12        0
PIPE                   1,140    0.5          1,160    0.5          0        0
SQL AREA           1,197,908   12.6     13,517,660    1.5    111,833       73
TABLE/PROCEDURE    3,847,439    0.8      4,230,265    7.9    142,200        0
TRIGGER                8,444    2.4          8,657   18.5      1,274        0
                    GES Lock      GES Pin      GES Pin   GES Inval GES Invali-
Namespace           Requests     Requests     Releases    Requests     dations
BODY                       1        1,234        1,258         985           0
CLUSTER                3,222           25           25          25           0
INDEX                 13,792        3,641        3,631       3,629           0
JAVA DATA                  0            0            0           0           0
JAVA RESOURCE              0           26           25           0           0
PIPE                       0            0            0           0           0
SQL AREA                   0            0            0           0           0
TABLE/PROCEDURE      857,137       13,130       13,264      10,762           0
TRIGGER                    0          200          202         200           0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
STATSPACK report for
DB Name         DB Id    Instance     Inst Num Release     Cluster Host
PROD            21184234 PROD2               2 9.2.0.8.0   YES     npi-or-db-p-
                                                                   12.npi.corp
              Snap Id     Snap Time      Sessions Curs/Sess Comment
Begin Snap:    149847 30-Oct-09 14:00:05      493 #########
End Snap:    149857 30-Oct-09 15:00:02      432 #########
   Elapsed:               59.95 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     8,192M      Std Block Size:          8K
           Shared Pool Size:     1,024M          Log Buffer:     10,240K
Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                  Redo size:             71,853.44             32,058.65
              Logical reads:            273,904.84            122,207.36
              Block changes:                889.13                396.70
             Physical reads:                 40.40                 18.03
            Physical writes:                 20.97                  9.35
                 User calls:                153.74                 68.60
                     Parses:                 66.19                 29.53
                Hard parses:                  2.66                  1.19
                      Sorts:                 25.70                 11.47
                     Logons:                  0.16                  0.07
                   Executes:                726.41                324.10
               Transactions:                  2.24
% Blocks changed per Read:    0.32    Recursive Call %:     92.41
Rollback per transaction %:    4.84       Rows per Sort:    193.55
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %: 100.00       Redo NoWait %:     99.99
            Buffer Hit   %:   99.99    In-memory Sort %:    100.00
            Library Hit   %:   99.35        Soft Parse %:     95.97
         Execute to Parse %:   90.89         Latch Hit %:     99.99
Parse CPU to Parse Elapsd %:   36.55     % Non-Parse CPU:     98.28
Wait Events for DB: PROD Instance: PROD2 Snaps: 149847 -149857
-> s - second
-> cs - centisecond -     100th of a second
-> ms - millisecond -    1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
enqueue                            65,823     33,667     90,459   1374      8.2
row cache lock                     38,996        560      1,795     46      4.8
PX Deq Credit: send blkd              522        499      1,223   2344      0.1
PX Deq: Parse Reply                   466        416        987   2117      0.1
db file sequential read            50,130          0        421      8      6.2
library cache lock                 78,842        172        210      3      9.8
db file scattered read              6,904          0        152     22      0.9
global cache cr request            84,801        575        113      1     10.5
latch free                          8,096        736         65      8      1.0
log file sync                       5,676         27         41      7      0.7
wait for scn from all nodes        18,891         10         24      1      2.3
CGS wait for IPC msg              394,678    392,142         21      0     49.0
library cache pin                   1,339          0         17     13      0.2
global cache null to x              2,145         48         16      8      0.3
global cache s to x                 3,242         32         16      5      0.4
buffer busy waits                     366         10         15     40      0.0
ksxr poll remote instances         70,990     31,295         14      0      8.8
db file parallel read                 359          0         11     31      0.0
global cache open x                 2,708         55         10      4      0.3
async disk IO                       3,474          0          8      2      0.4
global cache open s                 3,470         10          6      2      0.4
log file parallel write            13,076          0          5      0      1.6
global cache busy                      58         40          5     90      0.0
PL/SQL lock timer                       1          1          5   4877      0.0
DFS lock handle                     3,362          0          5      1      0.4
log file sequential read              412          0          4     10      0.1
db file parallel write              2,774          0          3      1      0.3
library cache load lock                59          0          3     58      0.0
buffer busy global CR                 722          0          3      4      0.1
control file sequential read        6,398          0          3      0      0.8
SQL*Net break/reset to clien       16,078          0          2      0      2.0
name-service call wait                 26          0          2     67      0.0
control file parallel write         1,248          0          2      1      0.2
process startup                        24          0          1     49      0.0
KJC: Wait for msg sends to c        3,491          4          1      0      0.4
SQL*Net more data to client        23,724          0          1      0      2.9
buffer busy global cache               23          0          0     19      0.0
global cache null to s                114          0          0      4      0.0
PX Deq: reap credit                 5,646      5,509          0      0      0.7
log file switch completion              4          0          0     58      0.0
lock escalate retry                    54         54          0      1      0.0
IPC send completion sync              119        118          0      0      0.0
direct path read                    2,820          0          0      0      0.3
direct path read (lob)              3,632          0          0      0      0.5
PX Deq: Join ACK                       88         37          0      0      0.0
direct path write                   2,470          0          0      0      0.3
kksfbc child completion                 6          6          0      6      0.0
buffer deadlock                         3          3          0     11      0.0
global cache quiesce wait               4          4          0      8      0.0
Library Cache Activity for DB: PROD Instance: PROD2 Snaps: 149847 -149857
->"Pct Misses" should be very low
                         Get Pct        Pin        Pct               Invali-
Namespace           Requests Miss     Requests     Miss     Reloads dations
BODY                  27,353    0.5         28,091    6.5      1,643        0
CLUSTER                  203    1.0            269    1.5          0        0
INDEX                    526    9.9            271   19.9          0        0
JAVA DATA                 18    0.0            120    6.7          4        0
JAVA RESOURCE             20   45.0             56   26.8          3        0
JAVA SOURCE                1 100.0              1 100.0          0        0
PIPE                     999    0.4          1,043    0.4          0        0
SQL AREA             131,793    7.6      3,406,577    0.4      7,012        0
TABLE/PROCEDURE      926,987    0.2      1,907,993    1.0      8,845        0
TRIGGER                1,519    0.1          1,532    4.9         69        0
                    GES Lock      GES Pin      GES Pin   GES Inval GES Invali-
Namespace           Requests     Requests     Releases    Requests     dations
BODY                       1          129          277         117           0
CLUSTER                  168            2            2           2           0
INDEX                    271           52           56          52           0
JAVA DATA                  0            0            0           0           0
JAVA RESOURCE              0            9            6           0           0
JAVA SOURCE                0            1            1           1           0
PIPE                       0            0            0           0           0
SQL AREA                   0            0            0           0           0
TABLE/PROCEDURE       89,523          764          868         460           0
TRIGGER                    0            2           14           2           0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
DB Name         DB Id    Instance     Inst Num Release     Cluster Host
PROD            21184234 PROD3               3 9.2.0.8.0   YES     npi-or-db-p-
                                                                   13.npi.corp
              Snap Id     Snap Time      Sessions Curs/Sess Comment
Begin Snap:    149808 30-Oct-09 14:00:00       31 #########
End Snap:    149809 30-Oct-09 15:00:02       34 11,831.4
   Elapsed:               60.03 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     8,192M      Std Block Size:          8K
           Shared Pool Size:     1,024M          Log Buffer:     10,240K
Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                  Redo size:              1,518.14             36,700.35
              Logical reads:              1,333.43             32,235.02
              Block changes:                  5.09                123.01
             Physical reads:                 54.31              1,312.88
            Physical writes:                  3.91                 94.44
                 User calls:                  1.46                 35.40
                     Parses:                  2.24                 54.21
                Hard parses:                  0.04                  0.93
                      Sorts:                  0.84                 20.28
                     Logons:                  0.06                  1.45
                   Executes:                  3.11                 75.23
               Transactions:                  0.04
% Blocks changed per Read:    0.38    Recursive Call %:     94.31
Rollback per transaction %:   45.64       Rows per Sort:    215.97
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.99       Redo NoWait %:    100.00
            Buffer Hit   %:   96.21    In-memory Sort %:    100.00
            Library Hit   %:   99.07        Soft Parse %:     98.29
         Execute to Parse %:   27.94         Latch Hit %:     99.98
Parse CPU to Parse Elapsd %:   69.88     % Non-Parse CPU:     97.92
Wait Events for DB: PROD Instance: PROD3 Snaps: 149808 -149809
-> s - second
-> cs - centisecond -     100th of a second
-> ms - millisecond -    1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
enqueue                            19,510      7,472     15,509    795    130.9
PX Deq: Parse Reply                 1,152      1,071      2,577   2237      7.7
row cache lock                      2,202        518      1,579    717     14.8
db file scattered read             31,556          0        354     11    211.8
db file sequential read            17,272          0         67      4    115.9
db file parallel read               1,722          0         34     20     11.6
global cache cr request            53,754         91         32      1    360.8
wait for scn from all nodes         1,897         13         10      5     12.7
CGS wait for IPC msg              403,358    401,478         10      0 2,707.1
DFS lock handle                     4,753          0          8      2     31.9
direct path read                    1,248          0          6      5      8.4
PX Deq: Execute Reply                 110         38          6     51      0.7
global cache open s                   160         10          5     31      1.1
control file sequential read        6,442          0          3      0     43.2
name-service call wait                 26          0          2     78      0.2
latch free                            129        109          2     13      0.9
KJC: Wait for msg sends to c          153         24          1      9      1.0
control file parallel write         1,245          0          1      1      8.4
buffer busy waits                     199          0          1      6      1.3
process startup                        20          0          1     44      0.1
global cache null to x                 74          2          1      9      0.5
global cache null to s                 19          0          1     29      0.1
global cache open x                   268          1          1      2      1.8
library cache lock                  1,150          0          0      0      7.7
PX Deq: Join ACK                      129         48          0      3      0.9
log file parallel write             1,157          0          0      0      7.8
async disk IO                         219          0          0      1      1.5
direct path write                   1,024          0          0      0      6.9
ksxr poll remote instances          6,740      4,595          0      0     45.2
PX Deq: reap credit                 6,580      6,511          0      0     44.2
buffer busy global CR                  73          0          0      2      0.5
log file sequential read               11          0          0     10      0.1
log file sync                         100          0          0      1      0.7
global cache s to x                   282          2          0      0      1.9
db file parallel write                 95          0          0      1      0.6
library cache pin                     142          0          0      0      1.0
SQL*Net break/reset to clien           28          0          0      1      0.2
IPC send completion sync               81         81          0      0      0.5
PX Deq: Signal ACK                     32         14          0      1      0.2
PX Deq Credit: send blkd                3          1          0      7      0.0
SQL*Net more data to client           841          0          0      0      5.6
PX Deq: Msg Fragment                   37         17          0      0      0.2
log file single write                   4          0          0      1      0.0
db file single write                    1          0          0      1      0.0
SQL*Net message from client         4,213          0     13,673   3246     28.3
gcs remote message                214,784     75,745      7,016     33 1,441.5
wakeup time manager                   233        233      6,812 29237      1.6
PX Idle Wait                        2,338      2,294      5,686   2432     15.7
PX Deq: Execution Msg               2,151      1,979      4,796   2229     14.4
Library Cache Activity for DB: PROD Instance: PROD3 Snaps: 149808 -149809
->"Pct Misses" should be very low
                         Get Pct        Pin        Pct               Invali-
Namespace           Requests Miss     Requests     Miss     Reloads dations
BODY                   1,290    0.0          1,290    0.0          0        0
CLUSTER                   18    0.0              8    0.0          0        0
SQL AREA               4,893    2.0         36,371    0.5          2        0
TABLE/PROCEDURE        1,555    3.9          3,834    4.9         71        0
TRIGGER                  286    0.0            286    0.0          0        0
                    GES Lock      GES Pin      GES Pin   GES Inval GES Invali-
Namespace           Requests     Requests     Releases    Requests     dations
BODY                       1            0            0           0           0
CLUSTER                    4            0            0           0           0
SQL AREA                   0            0            0           0           0
TABLE/PROCEDURE          863          224           42          42           0
TRIGGER                    0            0            0           0           0
          -------------------------------------------------------------

Oracle RAC scalability(does it linearly scalable upto 20-30 nodes)

We are looking datastorage solution as Oracle -RAC for following performance requirement
Application is generating Resources which we want to store in database and provide searching on this resources.
Resource have 2 part one is data and another one is meta
Data contains textual/binray data like txt/html/doc/excel/pdf/image file etc
meta contains 30-40 different property telling something about data.
Average resource size is 10K
Insertion speed required for such resource (Data + Meta ) 2Gbps(30K Resource/Second )
We want indexing also on data and meta.
We used single oracle database and created resource table which has 40 column for meta property and one column of blob type for data.
Performance achieved is 100Mbps insertion speed (on normal machine)
Now to go to 2Gbps we are thinking to use Oracle RAC to scale it up to 2Gbps Insertion speed.May be 20 Node is required to scale it upto 2Gbps.
Now my question is does Oracle RAC provide close to liener scalability upto 20-30 nodes or not.
Key requirement is to achieve insertion speed upto 2Gbps
High availability of oracle rac can be added advantage for us but key concern here is scalability not fault tolerance.

> Now we are not using oracle partitioning because it
is slow when we define domain index (even index is
local and it is not sync real time ,index maintaines
is off) but it maintains pending queue and which
slows down insertion process.
Hmm.. I'm using partitioning extensively for mass parallel inserts and it is a lot cleaner than individual tables (requiring dynamic SQL), and I have not seen any performance issues.
Can you elaborate on what issues you have seen?
> Our main and key concern is to achieve insertion
speed of 2Gbps initialy and should be scalable upto
10Gbbps.
May I ask what data you are collecting? This volume sounds a bit extreme - is something like collecting/sniffing UDP/TCP packets?
> 1.If we have say I/O and network bandwidth available
then does oracle RAC will be capable to consume this
available I/O and network bandwidth by adding more
nodes.
Yes. Remember that each cluster node has its own set of local platform resources - including a pipe to the shared storage (such as a I/O fibre channel).
What will cause an impact? Anything that will impact a single insert process will also impact that process across a cluster. E.g. two processes attempting to insert a row with the same PK - only one can succeed and thus one will be blocked by another. Bitmap indexing as a lock on a bitmap "slot" locks the index data for a number of rows - any of which can currently be updated by other processes. Etc.
How is this resolved in a cluster? As the processes are not local to the same platform, IPC cannot be used. Thus it means the Interconnect has to be used. This will be slower than IPC.
> 2.We also heard from HP that Interconnect of node
will be botteleneck for us but my question is if we
look at above scenario where only insertion and
searching is there and no updation is there in system
then will RAC Interconnect become botteleneck just
for hearbeat messanging.
The Interconnect need not be a bottleneck. Besides, the Interconnect is a fundamental cog in the share-everything cluster machine. It is not a "Bad Thing". So I question what HP is saying - for me to accept such advice, it needs to be backed up with hard technical facts.
If Interconnect is such an issue according to HP, just what do they recommend you use to scale your system with? Let me guess - some very expensive and very complex HP product? A superdome perhaps?
> 3.Indexing time is much more even we create index for
one hour data at one shot, can we dedicate few nodes
in RAC just for indexing.
That is what I'm doing - running up to a 100+ PQ process to do the index builds and rebuilds.
> 4.When i use Create table as select command to
transfer same amount of data from one table to
another table then it takes only 30 seconds and when
i use direct path uploading then it takes 3 minute,
Make sure that you're performance comparisons are valid. What do you imply with "direct path uploading"?
Remember that CTAS is disk-to-disk I/O via the SGA buffer cache. There is no "client side" involved. Nothing external. Not even a PGA buffer area. No pushing data from a client process via IPC to an Oracle server process.
If your benchmark includes pushing data from a client, even from a PL/SQL process, that will be slower than a CTAS - always.
When dealing with such large volumes, the "traditional RDBMS" approach need to be carefully considered. Every single constraint, every single index, every single trigger, results in a tiny overhead that becomes a very huge overhead given the data volumes.
Data management also plays a crucial role. Unless you can manage the data, you cannot effectively insert such huge volumes, process those volumes and query those volumes.
I see RAC and partitioning and PL/SQL server side processing as crucial ingredients to make this work.

Install Oracle RAC 10g (10.2.0.1) on HP-UX B.11.31 U ia64 failed

Hi All
I am installing Oracle RAC 10g 10.2.0.1 on HP-UX B.11.31 U ia64 but can not complete
hosts file
#Public IPs
10.144.1.111 spgdb01
10.144.1.112 spgdb02
#Private IPs
10.144.2.2 spgdb01p
10.144.2.3 spgdb02p
#Virtual IPs
10.144.1.113 spgdb01v
10.144.1.114 spgdb02v
I do installation with runInstaller without error. It copy and link is ok. When I run root.sh then It cannot complete as following
Checking to see if Oracle CRS stack is already configured
Checking to see if any 9i GSD is up
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle/product/10.2.0' is not owned by root
WARNING: directory '/oracle/product' is not owned by root
WARNING: directory '/oracle' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 0: spgdb01 spgdb01p spgdb01
node 1: spgdb02 spgdb02p spgdb02
Creating OCR keys for user 'root', privgrp 'sys'..
Operation successful.
Now formatting voting device: /ora/crs/votedisk01
waitpid(-1, 0x7fffdf50, WUNTRACED) .................................................................................................... [sleeping]
Now formatting voting device: /oracle/oradata1/crs/votedisk02
Now formatting voting device: /oracle/oradata2/crs/votedisk03
Format of 3 voting devices complete.
Startup will be queued to init within 30 seconds.
====================
I have waited for 10 mins but still not complete
Additionally, log from runInstaller, I got
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2011-04-28_12-13-31AM. Please wait ...-bash-4.2$ Oracle Universal Installer, Version 10.2.0.1.0 Production
Copyright (C) 1999, 2005, Oracle. All rights reserved.
Private Interconnect : null
Private Interconnect : null
Private Interconnect : null
Private Interconnect : null
So, please help me fix this issue
Thank you

I had this problem and resolved it by transporting the file to the installation server with the correct ftp datatype (binary).
On page 54 of the install guide (..Server\Oracle_Business_Intelligence\doc\doc\bi.1013\b31765.pdf) that comes with the installation files, there is an instruction to make sure that any ftp activity is done in binary.
This may not have occured with the license.xml file if you use a tool which offers the "feature" of automatic datatype recognition.
Hope this helps.

Oracle rac templates 11g R2 buildcluster.sh error

Hi All,
am facing below error, while creating oracle rac templates. kindly let us know how to resolve below error.
===error=========================
Oracle RAC 11gR2 OneCommand (v1.2) for Oracle VM - (c) 2010-2011 Oracle Corporation
   Cksum: [1170221909 255000 racovm.sh] at Sun Jan 5 04:15:14 EST 2014
   Kernel: 2.6.18-194.0.0.0.3.el5xen (i686) [1 processor(s)] 1700 MB
2014-01-05 04:15:14:[printparams:Time :racnode1] Completed successfully in 4 seconds (0h:00m:04s)
2014-01-05 04:15:14:[setsshora:Start:racnode1] SSH Setup for the Oracle user(s)...
INFO (node:racnode1): Running as oracle: /u01/racovm/ssh/setssh-Linux.sh -s -x -c NO -h nodelist -p ***   (setup on 2 node(s): racnode1 racnode2)
ERROR: Failed to create temporary file /tmp/setssh-cretmpQY3958 on localhost, can not proceed
Exiting...
ERROR (node:racnode1): Failed to configure passwordless SSH for the oracle user
2014-01-05 04:15:17:[setsshora:Time :racnode1] Completed with errors in 3 seconds (0h:00m:03s), status: 1
2014-01-05 04:15:17:[buildcluster:Time :racnode1] Completed with errors in 58 seconds (0h:00m:58s), status: 1
thanks,
Mike.

Try this. It worked for me.
Please keep in mind that you will need wait till each step finishes successfully before move to next one
For Step1 and 2, you can skip node(s) on which you didn't execute root.sh yet.
Step 1: As root, run "$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force" on all nodes, except the last one.
Step 2: As root, run "$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode" on last node. This command will zero out OCR and VD disk also.
Step 3: As root, run $GRID_HOME/root.sh on all node one by one

ASM instances on 2 node Oracle RAC 10g r2 on Red Hat 4 u1

Hi all
I'm experiencing a problem in configuring diskgroups under +ASM instances on a two node Oracle RAC.
I followed the official guide and also official documents from metalink site, but i'm stuck with the visibility of asm disks.
I created fake disks on nfs with Netapp certified storage binding them to block device with the usual trick "losetup /dev/loopX /nfs/disk1 " ,
run "oracleasm createdisk DISKX /dev/loopX" on one node and
"oracleasm scandisks" on the other one.
With "oracleasm listdisks" i can see the disks at OS level in both nodes , but , when i try to create and mount diskgroup in the ASM instances , on the instance on which i create the diskgroup all is well, but the other one doesn't see the disks at all, and diskgroup mount fails with :
ERROR: no PST quorum in group 1: required 2, found 0
Tue Sep 20 16:22:32 2005
NOTE: cache dismounting group 1/0x6F88595E (DG1)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DG1 was not mounted
any help would be appreciated
thanks a lot.
Antonello

I'm having this same problem. Did you ever find a solution?

Error in ONS logs while implmenting FCF on oracle RAC from java program

I have java prog on client machine that uses properties from a property file.While making the connection to the ONS port on the oracle RAC server to implement FCF the program is throwing error as below:
java.sql.SQLException: Io exception: The Network Adapter could not establish the connection
and when i checked the ons logs for that node the logs are as follows:
Connection 5,199.xxx.xxxxxx,8200 header RCV failed (Connect
ion reset by peer) coFlags=1002a
These logs are generated only when java program tries to connect else the daemon started without any errors.
But sometime it connets and gives the desired output.
Please advice and do let me know in case you need more information.
Java program on the client machine is as follows..
* Oracle Support Services
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Enumeration;
import java.util.Properties;
import java.util.ResourceBundle;
import oracle.jdbc.pool.OracleConnectionCacheManager;
import oracle.jdbc.pool.OracleDataSource;
public class FCFConnectionCacheExample
private OracleDataSource ods = null;
private OracleConnectionCacheManager occm = null;
private Properties cacheProperties = null;
public FCFConnectionCacheExample() throws SQLException
// create a cache manager
occm = OracleConnectionCacheManager.getConnectionCacheManagerInstance();
Properties props = loadProperties("fcfcache");
cacheProperties = new java.util.Properties();
cacheProperties.setProperty("InitialLimit", (String)props.get("InitialLimit"));
cacheProperties.setProperty("MinLimit", (String)props.get("MinLimit"));
cacheProperties.setProperty("MaxLimit", (String)props.get("MaxLimit"));
ods = new OracleDataSource();
ods.setUser((String)props.get("username"));
ods.setPassword((String)props.get("password"));
ods.setConnectionCachingEnabled(true);
ods.setFastConnectionFailoverEnabled(true);
ods.setConnectionCacheName("MyCache");
ods.setONSConfiguration((String)props.get("onsconfig"));
ods.setURL((String)props.get("url"));
occm.createCache("MyCache", ods, cacheProperties);
private Properties loadProperties (String file)
Properties prop = new Properties();
ResourceBundle bundle = ResourceBundle.getBundle(file);
Enumeration enumlist = bundle.getKeys();
String key = null;
while (enumlist.hasMoreElements())
key = (String) enumlist.nextElement();
prop.put(key, bundle.getObject(key));
return prop;
public void run() throws Exception
Connection conn = null;
Statement stmt = null;
ResultSet rset = null;
String sQuery =
"select sys_context('userenv', 'instance_name'), " +
"sys_context('userenv', 'server_host'), " +
"sys_context('userenv', 'service_name') " +
"from dual";
try
conn = null;
conn = ods.getConnection();
stmt = conn.createStatement();
rset = stmt.executeQuery(sQuery);
rset.next();
System.out.println("-----------");
System.out.println("Instance -> " + rset.getString(1));
System.out.println("Host -> " + rset.getString(2));
System.out.println("Service -> " + rset.getString(3));
System.out.println("NumberOfAvailableConnections: " +
occm.getNumberOfAvailableConnections("MyCache"));
System.out.println("NumberOfActiveConnections: " +
occm.getNumberOfActiveConnections("MyCache"));
System.out.println("-----------");
catch (SQLException sqle)
while (sqle != null)
System.out.println("SQL State: " + sqle.getSQLState());
System.out.println("Vendor Specific code: " +
sqle.getErrorCode());
Throwable te = sqle.getCause();
while (te != null) {
System.out.print("Throwable: " + te);
te = te.getCause();
sqle.printStackTrace();
sqle = sqle.getNextException();
finally
try
rset.close();
stmt.close();
conn.close();
catch (SQLException sqle2)
System.out.println("Error during close");
public static void main(String[] args)
System.out.println(">> PROGRAM using JDBC thin driver no oracle client required");
System.out.println(">> ojdbc14.jar and ons.jar must be in the CLASSPATH");
System.out.println(">> Press CNTRL C to exit running program\n");
try
FCFConnectionCacheExample test = new FCFConnectionCacheExample();
while (true)
test.run();
Thread.currentThread().sleep(10000);
catch (InterruptedException e)
System.out.println("PROGRAM Ended by user");
catch (Exception ex)
System.out.println("Error Occurred in MAIN");
ex.printStackTrace();
Some of the info i have deleted intensionally as this is confidential
Property file is as follows
# properties required for test
username=test
password=test
InitialLimit=10
MinLimit=10
MaxLimit=20
onsconfig=nodes=RAC-node1:port,RAC-node2:port
url=jdbc:oracle:thin:@(DESCRIPTION= \
(LOAD_BALANCE=yes) \
(ADDRESS=(PROTOCOL=TCP)(HOST=RAC-node1)(PORT=1521)) \
(ADDRESS=(PROTOCOL=TCP)(HOST=RAC-node1)(PORT=1521)) \
(CONNECT_DATA=(service_name=RAC_SERVICE)))

Hi;
Please check below note:
Link Errors While Installing CRS & RAC Database software [ID 438747.1]
Codeword File $TIMEBOMB_CWD,/opt/aCC/newconfig/aCC.cwd Missing Or Empty [ID 552893.1]
Regard
Helios

Recommendations - Oracle RAC 10g on Solaris 10 Containers Logical/Local..

Dear Oracle Experts et all
I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? I was assuming that Oracle will always link it self with OS binaries and Libraries while S/W installation and hence will need an OS image/Root Disk over which it could go. However, in containers, I assume we have a single solaris installation and configuration which will thus be shared to the containers which will be further configured in it. In such situations how does Oracle instalation proceed? Do I need to look at a scenario where, the global Container/Zone will have Oracle install and this image be shared across to zones/containers accordingly? If it is so, what all filesystems from OS will need to be shared across to these zones/containers?
Additionally, even if this approach is supported, is it a recommended approach? I am unsure about the stability and functionality of Oracle in such cases and am not able to completly conceptualize. However, I assume there could be certain items which needs to be approprietly taken care off. It will help if you could share observations from your experiences.
2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?
Our requirement is to have the above mentioned multiple Oracle installations spread across two (2) seperate H/W platforms,say Node A and Node B, and configure our Cluster Solution to behave as active-passive across Node A and Node B. In other words, I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. I will configure one database each for each of these Oracle S/W installation (with an idea not to have Clusterware between clustering solution VRTS/SunCluster and Oracle installation, if it works). Now I will run 3 databases thus on each of these nodes. If any downtime happens on any one of the nodes, say Node A, I will fail all oracle databases and S/W accordingly to the alternate available node, Node B in this case, using native clustering solution and I will want the database to behave as it was behaving earlier, on Node A. I am not sure though if I will be able to bring the database up on Node B when resources in OS perspective are failed over.
we want to use Oracle 10g RAC Release 2 EE on Solaris 10 OS latest/one before the latest release.
Please share your thoughts.
Regards!
Sarat

Sarat Chandra C wrote:
Dear Oracle Experts et all
I have a couple of questions for Oracle 10g RAC implementation on Solaris and seek your advice. we are attempting to implement oracle 10g RAC on Solaris OS and SPARC Platform.
1 We are wondering if Oracle 10g RAC could be implemented on Solaris Local/Logical Containers? My understanding is that RAC in a Zone (Container) is not supported by Oracle, and will not work anyway. Regardless of installation, RAC needs to do cluster level stuff about the cluster configuration, changing network addresses dynamically, and sending guaranteed messages over the cluster interconnect. None of this stuff can be done in a Local Zone in Solaris, because Local Zones have fewer permissions that the Global Zone. This is part of the design of Solaris Zones, and nothing to do with how Oracle RAC itself works on them.
This is all down to the security model of Zones, and Local Zones lack the ability to do certain things, to stop them reconfiguring themselves and impacting other Zones. Hence RAC cannot do dynamic cluster reconfiguration in a Local Zone, such as changing virtual network addresses when a node fails.
My understanding is that RAC just cannot work in a Local Zone. This was certainly true 5 years ago (mid 2005), and was a result of the inherent design and implementation of Zones in Solaris. Things may have changed, so check the Solaris documentation, and check if Oracle RAC is supported in Local Zones. However, as I said, this limitation was inherent in the design of Zones, so I do not see how Sun could possibly have changed it so that RAC would work in a Local Zone.
To me, your only option is the Global Zone. Which pretty much destroys the argument for having Zones on a Solaris system, unless you can host other non-Oracle application on the other Zones.
2 The idea of RAC we are looking at is to have multiple Oracle Installations on top of native clustering solution say veritas clusters/Sun Clusters. Do we still need to have Oracle Cluster solution Clusterware (ORACRS) on top of this to achieve Oracle Clustering? Will I be able to install Oracle as a standalone installation on top of native clustering solution say veritas clusters/Sun Clusters?I am not sure the term 'native' is correct. All 'Cluster' software is low level, and has components that run within the operating system. Whether this is Sun Cluster, Veritas Cluster Server, or Oracle Clusterware. They are all as 'native' to Solaris as each other. They all perform the same function for Oracle RAC around Cluster management - which nodes are members of the cluster, heartbeats between nodes, reliable fast message delivery, etc.
You only need one piece of Cluster software. So pick one and use it. If you use the Sun or Veritas cluster products, then you do not need the Oracle Clusterware software. But I would use it, because it is free (included with RAC), is from Oracle themselves and so guaranteed to work, is fully supported, and is one less third party product to deal with. Having an all Oracle software stack makes things simpler and more reliable, as far as I am concerned. You can be sure that Oracle will have fully tested RAC on their own Clusterware, and be able to replicate any issues in their own support environments.
Officially the Sun and Veritas products will work and are supported. But when you get a problem with your Cluster environment, who are you going to call? You really want to avoid "finger pointing" when you have a problem, with each vendor blaming the cause of the problem on another vendor. Using an all Oracle stack is simpler, and ensures Oracle will "own" all your support problems.
Also future upgrades between versions will be simpler, as Oracle will release all their software together, and have tested it together. When using third party Cluster software, you have to wait for all vendors to release new versions of their own software, and then wait again while it is tested against all the different third party software that runs on it. I have heard of customers stuck on old versions of certain cluster products, who cannot upgrade because there are no compatible combinations in the support matrices between the cluster product and Oracle database versions.
I will configure Clustering Solution like VRTS/SunCluster in Active-Passive, then have 3 Oracle installations on Node A, another 3 on Node B. As I said before, these 3 Oracle installations will actually all be on the same Global Zone, because RAC will not go into Local Zones.
John

Oracle RAC Nodes getting reboot in case of preferred controller failed

When we are disconnecting both Fiber cable from preferred Controller A or plugging out Controller A card from Disk Array(IBM DS 4300), After 90 seconds both the servers are rebooting.
In this time complete RAC network is going out of service for approx 5 minutes.After reboot both servers are coming with both instances without any manual intervention
Its a critical issue for us because we are loosing High Availability, Let us know how we can resolve this critical issue.
Detail of Network:
1. Software- Oracle 10g Release2
2. OS- Redhat Linux 3 (Kernel Version-2.4.21-27.ELsmp)
3. Shared Storage- IBM DS 4300.
4. Multipathing Driver - RDAC (rdac-LINUX-09.00 A5.13)
4. Nodes- IBM 346
5. Databse on ASM
6. ASM,OCR & Voting Disk Preferred controller is A.
7. Hangcheck timer value is 210 seconds.
8. Both Server available with 2 HBA port . I HBA port is connected with Controller A and Seconfd HBA port is connected with Controller B of SAN Disk Array.
As per my understanding,
Voting disk resides in Disk Array and Controller A is preferred owner of Voting Disk LUN.. When i am disconnecting both fiber cable from preferred controller A , then Both Nodes Clusterware software trying to contact with Voting Disk, When they are unable to contact with Voting disk in specfic time period, they are going for reboot.
I tested Controller failure testing with Oracle RAC software as well without Oracle. Without Oracle its working fine and reason behind, in that time Disk Array is waiting for approx 300 seconds for changing preferred controlller from A to B.
But With Oracle, Clusterware Software reboot both nodes before Controller can shift from A to B.
So if i conclude,the tech who has good understanding of Oracle Clusterware on Linux OS & IBM RDAC multipath driver can help me.
when we install Oracle RAC on Linux, it is required to configure hangcheck timer.
Oracle recomends 180 second.
It means if one of node is hanging, then second node will wait for 180 seconds, if within 180 seconds ,it is not able to resolve this situation then it will reboot hung node.
I think Hangcheck timer configuration reuired only with Linux OS.
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=15 hangcheck_margin=60

Sorry
Hangcheck timer is
Configuration File
cat >> /etc/rc.d/rc.local << EOF
modprobe hangcheck-timer hangcheck_tick=30 hangcheck_margin=180

RCA for Oracle RAC Performance Issue

Hi DBAs,
I have setup a 2 node Oracle RAC 10.2.0.3 on Linux 4.5 (64 bit) with 16 GB memory and 4 dual core CPUs each. The database is serving a web application but unfortunately the system is at its knees. The performance is terrible. The storage is a EMC SAN but ASM is not implemented with a fear to further degrade the performance or not to complicate the system further.
I am seeking the expert advises from some GURUs from this forums to formulate the action plan to do the root cause analysis to the system and database. Please advise me what tools I can use to gather the information about the Root Cause. AWR Report is not very helpful. The system stats with top, vmstat, iostat only show the high resource usage but difficult to find the reason. OEM has configured and very frequently report all kind of high wait events.
How I can use effectively find Network bottle necks (netstat command which need to be really helpful to understand).
How I can see the system I/O (iostats) which can provide me some useful information. I don't understand what sould be the baseline or optimal values to compare the I/O activities.
I am seeking help and advised to diagnose the issue. I also want to represent this issue as a case study.
Thanks
-Samar-

First of all, RAC is mainly suited for OLTP applications.
Secondly, if your application is unscalable (it doesn't use bind variables and no SQL statements have been tuned and/or it has been ported from Sukkelserver 200<whatever>) running it against RAC will make things worse.
Thirdly: RAC uses a chatty Interconnect. If you didn't configure the Interconnect properly,and/or are using slow Network cards (1 Gb is mandatory), and/or you are not using a 9k MTU on your 1 Gb NIC, this again will make things worse.
You can't install RAC 'out of the box'. It won't perform! PERIOD.
Fourthly: you might suffer from your 'application' connecting and disconnecting for every individual SQL statement and/or commit every individual INSERT or UPDATE.
You need to address this.
Using ADDM and/or AWR is compulsory for analysing the problem, and/or having read Cary Millsaps book on Optimizing Oracle performance is compulsory.
You won't come anywhere without AWR and OS statistics will not provide any clue.
Because, paraphrasing William Jefferson Clinton, former president of the US of A:
It's the application, stupid.
99 out of 100 cases. Trust me. All developers I know currently are 100 percent clueless.
That said, if you can't be bothered to post the top 5 AWR events, and you aren't up to using AWR reports, maybe you should hire a consultant who can.
Regards,
Sybrand Bakker
Senior Oracle DBA

Row cache wait in Oracle RAC

Similar Messages

Maybe you are looking for