Buffer Hit % ?

Hi Guys,
My DB Buffer Hit % (from awr): is very low 69% (9am to 6pm daily).
Is this ratio a good indicator for me to increase my db cache? (<90%)
I have read several articles. Some suggest that it's a good indicator and some suggest not.
Anyone can give me some guidelines on what else should i look for if it's not an good indicator for me to add more memory?
thanks

dbaing wrote:
hi david,
what is the event name which will be reflected under top wait events which I will consider adding ram to buffer?
thanksYou wouldn't decide based on wait events, you'd do it based on buffer cache advice. However, if you do extensive tuning of your top SQL's (based on AWR and wait events etc) then afterwards it would be sensible to see how it affects buffer cache advice.
Look at both SGA and PGA sizing since you could easily suffer very severe performance difficulties on batch jobs and other large data volume operations due to sorts spilling to disk. In fact, if you run OLTP operations during the day and batch operations at night then you have to be careful in considering whether SGA or PGA is over/undersized, because it could be the case that you need a higher ratio of SGA to PGA for OLTP periods, and a lower ratio (ie. larger PGA and smaller SGA) for batch periods. Your cache advice when viewed over an entire 24 hour period could look pretty good on average, but be hiding different problems at different times of the day.

Similar Messages

  • STATSPACK REPORT (BUFFER HIT RATIO)

    my statspack report shows that my buffer ration is 83%...what factors i need to look to imporve the buffer hit ratio. Thanks

    I deleted because i realized that i took the statspack report of 1 day period.
    Below is the Statspack report of 1 hour. Can you please let me know if i still need to increase database buffer cache?
    STATSPACK report for
    Database DB Id Instance Inst Num Startup Time Release RAC
    ~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
    4254163 TEST1 1 28-Jun-07 23:30 10.2.0.3.0 NO
    Host Name: Linux3 Num CPUs: 2 Phys Memory (MB): 7,968
    ~~~~
    Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
    ~~~~~~~~ ---------- ------------------ -------- --------- -------------------
    Begin Snap: 32 03-Jul-07 11:59:13 23 11.0
    End Snap: 42 03-Jul-07 14:07:33 26 11.3
    Elapsed: 128.33 (mins)
    Cache Sizes Begin End
    ~~~~~~~~~~~ ---------- ----------
    Buffer Cache: 100M Std Block Size: 8K
    Shared Pool Size: 100M Log Buffer: 33,823K
    Load Profile Per Second Per Transaction
    ~~~~~~~~~~~~ --------------- ---------------
    Redo size: 1,259.57 8,598.13
    Logical reads: 148.39 1,012.92
    Block changes: 6.41 43.76
    Physical reads: 41.91 286.09
    Physical writes: 0.73 5.02
    User calls: 15.66 106.91
    Parses: 4.07 27.77
    Hard parses: 0.27 1.85
    Sorts: 1.70 11.61
    Logons: 0.01 0.07
    Executes: 9.59 65.47
    Transactions: 0.15
    % Blocks changed per Read: 4.32 Recursive Call %: 83.09
    Rollback per transaction %: 6.03 Rows per Sort: 11.39
    Instance Efficiency Percentages
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Buffer Nowait %: 100.00 Redo NoWait %: 100.00
    Buffer Hit %: 71.77 In-memory Sort %: 100.00
    Library Hit %: 93.15 Soft Parse %: 93.34
    Execute to Parse %: 57.58 Latch Hit %: 100.00
    Parse CPU to Parse Elapsd %: 97.12 % Non-Parse CPU: 86.74
    Shared Pool Statistics Begin End
    Memory Usage %: 91.37 92.38
    % SQL with executions>1: 77.55 80.43
    % Memory for SQL w/exec>1: 83.11 84.69
    Top 5 Timed Events Avg %Total
    ~~~~~~~~~~~~~~~~~~ wait Call
    Event Waits Time (s) (ms) Time
    CPU time 132 48.3
    db file sequential read 89,745 91 1 33.4
    db file scattered read 29,289 35 1 13.0
    control file parallel write 2,558 6 2 2.1
    log file parallel write 2,294 3 1 1.0
    Host CPU (CPUs: 2)
    ~~~~~~~~ Load Average
    Begin End User System Idle WIO WCPU
    0.11 0.11 2.26 2.65 95.09 0.90 0.24
    Instance CPU
    ~~~~~~~~~~~~
    % of total CPU for Instance: 1.06
    % of busy CPU for Instance: 21.63
    %DB time waiting for CPU - Resource Mgr:
    Memory Statistics Begin End
    ~~~~~~~~~~~~~~~~~ ------------ ------------
    Host Mem (MB): 7,967.6 7,967.6
    SGA use (MB): 316.0 316.0
    PGA use (MB): 57.8 62.6
    % Host Mem used for SGA+PGA: 4.7 4.8
    Time Model System Stats DB/Inst: TEST1/TEST1 Snaps: 32-42
    -> Ordered by % of DB time desc, Statistic name
    Statistic Time (s) % of DB time
    sql execute elapsed time 212.3 92.7
    DB CPU 124.2 54.2
    parse time elapsed 21.6 9.4
    hard parse elapsed time 19.7 8.6
    PL/SQL execution elapsed time 4.3 1.9
    hard parse (sharing criteria) elaps 1.4 .6
    connection management call elapsed 1.4 .6
    PL/SQL compilation elapsed time 1.2 .5
    repeated bind elapsed time 0.1 .0
    hard parse (bind mismatch) elapsed 0.1 .0
    sequence load elapsed time 0.0 .0
    DB time 228.9
    background elapsed time 48.2
    background cpu time 39.3
    Wait Events DB/Inst: TEST1/TEST1 Snaps: 32-42
    -> s - second, cs - centisecond, ms - millisecond, us - microsecond
    -> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
    -> Only events with Total Wait Time (s) >= .001 are shown
    -> ordered by Total Wait Time desc, Waits desc (idle events last)
    Avg
    %Time Total Wait wait Waits
    Event Waits -outs Time (s) (ms) /txn
    db file sequential read 89,745 0 91 1 79.6
    db file scattered read 29,289 0 35 1 26.0
    control file parallel write 2,558 0 6 2 2.3
    log file parallel write 2,294 0 3 1 2.0
    db file parallel write 2,179 0 3 1 1.9
    log file sync 1,089 0 2 2 1.0
    os thread startup 7 0 1 120 0.0
    latch free 3 0 0 89 0.0
    SQL*Net break/reset to client 640 0 0 0 0.6
    direct path read 140 0 0 1 0.1
    control file sequential read 3,599 0 0 0 3.2
    SQL*Net more data to client 2,121 0 0 0 1.9
    db file parallel read 49 0 0 1 0.0
    cursor: pin S wait on X 2 100 0 16 0.0
    read by other session 4 0 0 5 0.0
    direct path write 24 0 0 0 0.0
    latch: shared pool 1 0 0 2 0.0
    SQL*Net message from client 120,211 0 47,282 393 106.6
    wait for unread message on broadc 7,631 100 7,517 985 6.8
    Streams AQ: waiting for messages 1,540 100 7,512 4878 1.4
    Streams AQ: qmn slave idle wait 275 0 7,508 27302 0.2
    Streams AQ: qmn coordinator idle 554 51 7,508 13553 0.5
    Streams AQ: waiting for time mana 25 52 6,643 ###### 0.0
    SQL*Net message to client 120,215 0 0 0 106.6
    class slave wait 7 0 0 1 0.0
    SQL*Net more data from client 146 0 0 0 0.1
    Background Wait Events DB/Inst: TEST1/TEST1 Snaps: 32-42
    -> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
    -> Only events with Total Wait Time (s) >= .001 are shown
    -> ordered by Total Wait Time desc, Waits desc (idle events last)
    Avg
    %Time Total Wait wait Waits
    Event Waits -outs Time (s) (ms) /txn
    control file parallel write 2,557 0 6 2 2.3
    log file parallel write 2,290 0 3 1 2.0
    db file parallel write 2,179 0 3 1 1.9
    os thread startup 7 0 1 120 0.0
    db file sequential read 1,456 0 1 0 1.3
    db file scattered read 25 0 0 8 0.0
    control file sequential read 156 0 0 0 0.1
    latch: shared pool 1 0 0 2 0.0
    rdbms ipc message 25,017 92 59,496 2378 22.2
    pmon timer 2,576 100 7,513 2917 2.3
    Streams AQ: qmn slave idle wait 275 0 7,508 27302 0.2
    Streams AQ: qmn coordinator idle 554 51 7,508 13553 0.5
    smon timer 26 96 7,148 ###### 0.0
    Streams AQ: waiting for time mana 25 52 6,643 ###### 0.0
    Wait Event Histogram DB/Inst: TEST1/TEST1 Snaps: 32-42
    -> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
    -> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
    -> % of Waits - value: .0 indicates value was <.05%, null is truly 0
    -> Ordered by Event (idle events last)
    Total ----------------- % of Waits ------------------
    Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
    LGWR wait for redo copy 7 100.0
    SQL*Net break/reset to cli 640 99.2 .6 .2
    SQL*Net more data to clien 2121 100.0
    control file parallel writ 2558 84.2 12.0 .7 1.4 1.5 .2
    control file sequential re 3599 99.9 .1
    cursor: pin S wait on X 2 100.0
    db file parallel read 49 93.9 2.0 4.1
    db file parallel write 2179 68.2 19.9 6.8 4.0 .9 .1 .1
    db file scattered read 29K 90.7 6.0 .5 .5 .6 .8 .9
    db file sequential read 89K 89.4 2.8 1.3 3.6 1.5 .7 .6
    direct path read 140 87.1 2.9 .7 1.4 7.1 .7
    direct path write 24 100.0
    latch free 3 100.0
    latch: messages 1 100.0
    latch: shared pool 1 100.0
    log file parallel write 2294 77.4 17.3 2.0 1.3 1.1 .8 .2
    log file sync 1089 62.4 28.8 3.3 1.7 2.5 1.1 .2
    os thread startup 7 100.0
    read by other session 4 50.0 25.0 25.0
    SQL*Net message from clien 120K 95.2 1.6 .9 .3 .1 .2 .1 1.7
    SQL*Net message to client 120K 100.0
    SQL*Net more data from cli 146 100.0
    Streams AQ: qmn coordinato 554 49.1 .2 .2 50.5
    Streams AQ: qmn slave idle 275 100.0
    Streams AQ: waiting for me 1540 .2 99.8
    Streams AQ: waiting for ti 25 36.0 16.0 48.0
    class slave wait 7 85.7 14.3
    pmon timer 2577 .5 .1 .1 99.3
    rdbms ipc message 25K 2.3 1.3 1.4 .4 .4 .3 32.1 61.8
    smon timer 26 100.0
    wait for unread message on 7631 .0 .0 100.0 .0

  • Why buffer Hit greater than 100% in AWR

    hi,
    could anyone help explain why the buffer Hit is 376.58%, greater than 100% in my AWR. oracle version is 10.2.0.3.0. thanks!
    Instance Efficiency Percentages (Target 100%)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                Buffer Nowait %:  100.00       Redo NoWait %:  100.00
                Buffer  Hit   %:  376.58    In-memory Sort %:  100.00
                Library Hit   %:   89.27        Soft Parse %:   90.92
             Execute to Parse %:   52.41         Latch Hit %:   99.96
    Parse CPU to Parse Elapsd %:    4.24     % Non-Parse CPU:   99.59Edited by: user10091734 on Aug 1, 2009 10:57 PM
    Edited by: user10091734 on Aug 1, 2009 10:58 PM
    Edited by: user10091734 on Aug 1, 2009 10:59 PM

    sid.gd wrote:
    could anyone help explain why the buffer Hit is 376.58%, greater than 100% in my AWR. oracle version is 10.2.0.3.0. thanks!
    There are a couple of reasons why "physical reads" can exceed "logical reads" - one is a side effect of dynamic samplng, the other is a side effect of running dbms_stats. You could check the captured SQL to see if it offers any evidence for either of these activities during the snapshot interval.
    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
    "For every expert there is an equal and opposite expert."
    Arthur C. Clarke

  • Buffer Hit % discussion

    Hi Guys,
    I read few sites about buffer hit % vs performance of the database.
    I understand that a high % of hit ratio doesn't mean the performance is good. It's might means that the queries running in the database is doing alot of unwanted huge I/O by the user of unselective indexes.
    However can i Conclude that a low % of hit ratio = bad performance where we have to either look into the sql <reduce logical i/o which in term reduce physical i/o> and adding memory to the buffer cahe if we confirm all the sqls are good but still low hit %.
    Kindly share ur thoughts.
    thanks!

    dbaing wrote:
    However can i Conclude that a low % of hit ratio = bad performance where we have to either look into the sql <reduce logical i/o which in term reduce physical i/o> and adding memory to the buffer cahe if we confirm all the sqls are good but still low hit %.
    Kindly share ur thoughts.If you have a model of how you expect your system to behave then there may be cases, or times, when you can decide that some ratio is signficantly out of the range you expect.
    As far as the BCHR is concerned, you could imagine approximating an OLTP system with the (pessimistic) assumption that a single row access requires thee index block visits (root, branch, leaf) and one table block visit to pick up the row. If that's really the case you could also decide that for your model you have enough memory to buffer the indexes, but no memory for keep the tables buffered because of the extreme randomness of the table visits. If that model makes sense for you then you might expect a hit ratio of around 75% - and therefore start to worry if the ratio is significantly lower. Of course the OLTP system might run regular reports - which do large tablescans and hash joins and distort the figures; it might accumulate a large number of indexes over time which could invalidate your "all indexes buffered" model; there are probably a number of small tables which are constantly buffered that (ought to) push the hit ratio up.
    Without a reasonable model of what your system is supposed to do at what times of day, and what variation to expect over the week it's quite hard to make any comment about what constitutes a low or high BCHR - and it's hard to say how long and how far the figure should deviate from your expectation before you consider it to be showing threatening behaviour.
    Regards
    Jonathan Lewis

  • How should i increase over all buffer hit ratio.....

    Hi all,
    As shown below if my DB2 databse buffer Qulaity is low .. How should i increase over all buffer hit ratio..
    Please advice on any sap standrd notes or procedures
    Number                                    1
    Total Size                           80,000  KB
    Physical Reads                         6.65  ms
    Physical Writes                        0.00  ms
    Overall Buffer Quality                86.05  %
    Data Hit Ratio                        85.79  %
    Index Hit Ratio                       87.50  %
    No Victim Buffers                259,079,295
    --rahul

    One of the options is to simply increase the bufferpool size using the following command
    db2 alter bufferpool immediate <bufferpool name> size <new bufferpool size>
    However, this will affect the hit ration for a particular bufferpool. If you have more than one bufferpool, you need to identify the bufferpool(s) with worst hit ratio. In the SAP DBA Cockpit, check using
    Performance -> Bufferpool
    The victim buffer information is only useful in case you use alternate page cleaning.
    Note that there are other options to fight bad bufferpool hit ratio - however, with your small bufferpool size (80MB) maybe increasing the size is the appropriate step.
    Malte

  • Which is a better measure of buffer hit ratio?

    which out of these gives a better measure of buffer hit ratio?
    consistent gets from cache
    db block gets from cache
    physical reads cache
    or
    consistent gets
    db block gets
    physical reads
    from v@sysstat?

    Hi,
    Well you can always edit your reply.There is a button out there to do that.
    About the question,I am not clear with the question.What do you mean by "better measure of hit ratio"?
    From the PT guide,
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/memory.htm#i56283
    The formula includes all three of them in the calculation,
    SELECT NAME, VALUE
      FROM V$SYSSTAT
    WHERE NAME IN ('db block gets from cache', 'consistent gets from cache', 'physical reads cache');
    Using the values in the output of the query, calculate the hit ratio for the buffer cache with the following formula:
    1 - (('physical reads cache') / ('consistent gets from cache' + 'db block gets from cache')All three of them are part of the hit ratio.When you wrote the three statistics second time,I am not sure how they were different from the first one.All you did was that you removed "cache" word.
    There is no "better measure" statistics in the calculation.All three are there.
    If your question had some other meaning than I would appreciate if you rephrase and elaborate it please.
    Aman....

  • Differents values for Buffer Hit Ratio

    Hi Everyone!
    I have only a buffer_pool(8k) and when I execute this two querys(below). The "Buffer Hit Ratio" results are very different.
    Anyone know why does this happen?? I searched on the Internet and find out just V$SYSSTAT gather dates of all buffer pools
    while V$BUFFER_POOL_STATISTICS maintains statistics for each buffer pool. But i can't understand why it is so different.
    Any suggestion?
    Thanks!
    SELECT 1- ((p.value - l.value - d.value) / s.value) "Buffer Cache Hit Ratio"
    FROM v$sysstat s, v$sysstat l, v$sysstat d, v$sysstat p
    WHERE s.name = 'session logical reads'
    AND d.name = 'physical reads direct'
    AND l.name = 'physical reads direct (lob)'
    AND p.name = 'physical reads';
    Buffer Cache Hit Ratio
    ,970655577
    SELECT NAME, PHYSICAL_READS, DB_BLOCK_GETS, CONSISTENT_GETS,
    1 - (PHYSICAL_READS / (DB_BLOCK_GETS + CONSISTENT_GETS)) "Hit Ratio"
    FROM V$BUFFER_POOL_STATISTICS;
    Hit Ratio
    ,584760575
    ORACLE VERSION: Oracle Server 9.2.0.8.0
    OS: IBM/AIX

    See metalink note STATISTIC "cache hit ratio" - Reference Note 33883.1 for the correct formula to use in your calculation.
    The ratio is of limited value. In and of itself it tells you nothing about the performance of your database.
    There have been numerous threads on the ratio in this forum. You can use search to find them if interested.
    HTH -- Mark D Powell --

  • Low buffer hit ratio

    Hi all,
    could anyone please let me know how I can improve upon the buffer hit ratio in Oracle 8.0
    I have increased my buffer size by 40% in the init.ora parameters but there has been no gain at all in the buffer hit ratio after the system has been bounced and up by 42 hrs so could any let me know..
    Thanks
    Akkama

    Why don't u try pinning the objects accessed frequently - even pinning tables are possible.

  • Oracle Buffer Hit Ratio

    I m using this query to find Oracle Buffer Hit ratio .Is this query right.Is there some other way of getting more accurate ratio
    select trunc((1-(sum(decode(name,'physical reads',value,0))/(sum(decode(name,'db block gets',value,0)) + (sum(decode(name,'consistent gets',value,0)))))) * 100) from v$sysstat

    Asif,
    Too bad we couldn't meet when I happened to come to Bdesh coz if we could , we would had talked about the same topic for hours. If you are using this hit ratio as the tuning technique than you are not alone. There are so many dbas who are using hit ratio and loose sleep seeing not to a particular %. It was a method that was given years ago by Oracle reason being that at that time, databases , their loads, they were not too much. But now things have changed so should be the troubleshooting techniques. Search over this forum for the same topic and you would see some "cool" threads where some top-notch experts are talking about the same issue.
    Aman....

  • Buffer hit ratio

    I am using the following:
    SELECT ROUND(((1-(SUM(DECODE(NAME, 'physical reads', value, 0)) /
    (SUM(DECODE(NAME, 'db block gets', value, 0))+
    (SUM(DECODE(NAME, 'CONSISTENT GETS', value, 0))))))*100), 2) || '%' BCHR
    FROM V$SYSSTAT
    to calculate the buffer hit ratio. This query is returning: -1753.28%
    Can someone explain why I am getting this crazy number?
    Thanks,
    mdp

    >>
    Many folks misunderstand that bit about "setting your own BHR", and falsely conclude that it's a useless metric. It's not useless.
    <<
    The buffer cache ratio is useful only when considered in relation to other statistics. The problem is that the majority of users seem to think that that a high ratio value is good and a low ratio value is bad based on absolute values and do not understand that the static is dependent on how SQL plans are being solved. If you measure the ratio when the dominant work on the system is being done via hash joins, full scans that touch the target blocks only once, or make use of PQO during the process you can get a fairly low value, but the system is performing well. On the other had poorly performing SQL can result in a high value for the statistic. The value of the statistics bears no direct relationship to performance of the system and it needs to be emphasized that the ratio must be used in conjunction with other available information. The ratio by itself should be considered useless.
    >>
    If the BHR was totally useless, why does Oracle continue to include it in OEM alert thresholds, and STATSPACK and AWR reports?
    <<
    Over the years Oracle has done lots of things that turned out to be wrong so just because Oracle includes the statistics in certain products does not really provide a lot of support for the validity of the statistic. Known errors in the documentation have made it through two full releases. Again it is the misapplication of the statistic that is really at issue. Unfortunately, many poorly written DBA Administration and Tuning books in the past claimed that ratio could be used to measure database performance, and in point of fact the ratio has only a passing relationship to performance depending on the application.
    HTH -- Mark D Powell --

  • Buffer hit ratio (to be negative)

    Hi All,
    My DB Version: 10.2.0
    OS: Windows Server 2003
    When i am checking snapshots at peak time and comparing it with one when there is no load on the server the buffer hit ratio (to be negative), the buffer cache is too small and the data in is being aged out before it can be used so it must be retrieved again.
    So i just want to know that whether should i increase the value of db_cache_size.
    What exactly happens when i do this.

    Hi;
    DB_CACHE_SIZE specifies the size of the DEFAULT buffer pool for buffers with the primary block size (the block size defined by the DB_BLOCK_SIZE initialization parameter).
    The value must be at least 4M * number of cpus * granule size (smaller values are automatically rounded up to this value). A user-specified value larger than this is rounded up to the nearest granule size. A value of zero is illegal because it is needed for the DEFAULT memory pool of the primary block size, which is the block size for the SYSTEM tablespace.
    Source:
    http://docs.oracle.com/cd/B19306_01/server.102/b14237/initparams043.htm
    Regard
    Helios

  • Buffer Hit Ratio % -- Whats the right query ?

    Whats the right query to track Buffer Hit % ;
    Using this :
    prompt BUFFER HIT RATIO %
    prompt ===============
    select 100 * ((a.value+b.value)-c.value) / (a.value+b.value) "Buffer Hit Ratio"
    from v$sysstat a, v$sysstat b, v$sysstat c
    where
    a.statistic# = 38
    and
    b.statistic# = 39
    and
    c.statistic# = 40;
    Buffer Hit Ratio
    99.9678438
    However, using this :
    Select
    Round((Sum(Decode(name, 'consistent gets',value,0)) +
    Sum(Decode(name, 'db block gets',value,0)) -
    Sum(Decode(name, 'physical reads',value,0))) /
    (Sum(Decode(name, 'consistent gets',value,0)) +
    Sum(Decode(name, 'db block gets',value,0)) ) * 100, 4)
    from V$sysstat;
    Comes up as : 67.7069 %
    So which is the right one ?
    Thanks.

    user4874781 wrote:
    Well, I recently joined this organisation and that was the script that was used since long to check Buffer Hit Ratio%.
    But when I ran a TOAD report, using the other query, the value came up different.
    So am confused .. Whats the difference and which is the right one ?
    Try running the following query:
    select
            statistic#, name
    from
            v$sysstat
    where
            statistic# in (38,39,40)
    or      name in (
                    'consistent gets',
                    'physical reads',
                    'db block gets'
    ;The you will understand the point the previous answer was making. It's a bad idea to rely on things like the statistic# being consistent across different versions of Oracle - names tend to be safer.
    But neither query is correct. If you want any sort of vaguely meaningful "buffer cache hit ratio", you should be quering v$buffer_pool_statistics. See also: Re: Testing of buffer cache reveals these results: and http://jonathanlewis.wordpress.com/2007/09/02/hit-ratios/
    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
    "The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge." Stephen Hawking.

  • Windows TCP Socket Buffer Hitting Plateau Too Early

    Note: This is a repost of a ServerFault Question edited over the course of a few days, originally here: http://serverfault.com/questions/608060/windows-tcp-window-scaling-hitting-plateau-too-early
    Scenario: We have a number of Windows clients regularly uploading large files (FTP/SVN/HTTP PUT/SCP) to Linux servers that are ~100-160ms away. We have 1Gbit/s synchronous bandwidth at the office and the servers are either AWS instances or physically hosted
    in US DCs.
    The initial report was that uploads to a new server instance were much slower than they could be. This bore out in testing and from multiple locations; clients were seeing stable 2-5Mbit/s to the host from their Windows systems.
    I broke out iperf
    -s on a an AWS instance and then from a Windows client in the office:
    iperf
    -c 1.2.3.4
    [ 5] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 55185
    [ 5] 0.0-10.0 sec 6.55 MBytes 5.48 Mbits/sec
    iperf
    -w1M -c 1.2.3.4
    [ 4] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 55239
    [ 4] 0.0-18.3 sec 196 MBytes 89.6 Mbits/sec
    The latter figure can vary significantly on subsequent tests, (Vagaries of AWS) but is usually between 70 and 130Mbit/s which is more than enough for our needs. Wiresharking the session, I can see:
    iperf
    -c Windows SYN - Window 64kb, Scale 1 - Linux SYN, ACK: Window 14kb, Scale: 9 (*512) 
    iperf
    -c -w1M Windows SYN - Windows 64kb, Scale 1 - Linux SYN, ACK: Window 14kb, Scale: 9
    Clearly the link can sustain this high throughput, but I have to explicity set the window size to make any use of it, which most real world applications won't let me do. The TCP handshakes use the same starting points in each case, but the forced one scales
    Conversely, from a Linux client on the same network a straight, iperf
    -c (using the system default 85kb) gives me:
    [ 5] local 10.169.40.14 port 5001 connected with 1.2.3.4 port 33263
    [ 5] 0.0-10.8 sec 142 MBytes 110 Mbits/sec
    Without any forcing, it scales as expected. This can't be something in the intervening hops or our local switches/routers and seems to affect Windows 7 and 8 clients alike. I've read lots of guides on auto-tuning, but these are typically about disabling scaling
    altogether to work around bad terrible home networking kit.
    Can anyone tell me what's happening here and give me a way of fixing it? (Preferably something I can stick in to the registry via GPO.)
    Notes
    The AWS Linux instance in question has the following kernel settings applied in sysctl.conf:
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    net.core.rmem_default = 1048576
    net.core.wmem_default = 1048576
    net.ipv4.tcp_rmem = 4096 1048576 16777216
    net.ipv4.tcp_wmem = 4096 1048576 16777216
    I've used dd
    if=/dev/zero | nc redirecting to /dev/null at
    the server end to rule out iperfand
    remove any other possible bottlenecks, but the results are much the same. Tests with ncftp(Cygwin,
    Native Windows, Linux) scale in much the same way as the above iperf tests on their respective platforms.
    First fix attempts.
    Enabling CTCP - This makes no difference; window scaling is identical. (If I understand this correctly, this setting increases the rate at which the congestion window is enlarged rather than the maximum size it can reach)
    Enabling TCP timestamps. - No change here either.
    Nagle's algorithm - That makes sense and at least it means I can probably ignore that particular blips in the graph as any indication of the problem.
    pcap files: Zip file available here: https://www.dropbox.com/s/104qdysmk01lnf6/iperf-pcaps-10s-Win%2BLinux-2014-06-30.zip (Anonymised
    with bittwiste, extracts to ~150MB as there's one from each OS client for comparison)
    Second fix attempts.
    I've enabled ctcp and disabled chimney offloading: TCP Global Parameters
    Receive-Side Scaling State : enabled
    Chimney Offload State : disabled
    NetDMA State : enabled
    Direct Cache Acess (DCA) : disabled
    Receive Window Auto-Tuning Level : normal
    Add-On Congestion Control Provider : ctcp
    ECN Capability : disabled
    RFC 1323 Timestamps : enabled
    Initial RTO : 3000
    Non Sack Rtt Resiliency : disabled
    But sadly, no change in the throughput.
    I do have a cause/effect question here, though: The graphs are of the RWIN value set in the server's ACKs to the client. With Windows clients, am I right in thinking that Linux isn't scaling this value beyond that low point because the client's limited CWIN
    prevents even that buffer from being filled? Could there be some other reason that Linux is artificially limiting the RWIN?
    Note: I've tried turning on ECN for the hell of it; but no change, there.
    Third fix attempts.
    No change following disabling heuristics and RWIN autotuning. Have updated the Intel network drivers to the latest (12.10.28.0) with software that exposes functioanlity tweaks viadevice manager tabs. The card is an 82579V Chipset on-board NIC - (I'm going to
    do some more testing from clients with realtek or other vendors)
    Focusing on the NIC for a moment, I've tried the following (Mostly just ruling out unlikely culprits):
    Increase receive buffers to 2k from 256 and transmit buffers to 2k from 512 (Both now at maximum) - No change
    Disabled all IP/TCP/UDP checksum offloading. - No change.
    Disabled Large Send Offload - Nada.
    Turned off IPv6, QoS scheduling - Nowt.
    Further investigation
    Trying to eliminate the Linux server side, I started up a Server 2012R2 instance and repeated the tests using iperf (cygwin
    binary) and NTttcp.
    With iperf,
    I had to explicitly specify -w1m on both sides
    before the connection would scale beyond ~5Mbit/s. (Incidentally, I could be checked and the BDP of ~5Mbits at 91ms latency is almost precisely 64kb. Spot the limit...)
    The ntttcp binaries showed now such limitation. Using ntttcpr
    -m 1,0,1.2.3.5 on the server and ntttcp
    -s -m 1,0,1.2.3.5 -t 10 on the client, I can see much better throughput:
    Copyright Version 5.28
    Network activity progressing...
    Thread Time(s) Throughput(KB/s) Avg B / Compl
    ====== ======= ================ =============
    0 9.990 8155.355 65536.000
    ##### Totals: #####
    Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)
    ================ =========== ============== ================
    79.562500 10.001 1442.556 7.955
    Throughput(Buffers/s) Cycles/Byte Buffers
    ===================== =========== =============
    127.287 308.256 1273.000
    DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)
    ============= ============= =============== ==============
    1868.713 0.785 9336.366 0.157
    Packets Sent Packets Received Retransmits Errors Avg. CPU %
    ============ ================ =========== ====== ==========
    57833 14664 0 0 9.476
    8MB/s puts it up at the levels I was getting with explicitly large windows in iperf.
    Oddly, though, 80MB in 1273 buffers = a 64kB buffer again. A further wireshark shows a good, variable RWIN coming back from the server (Scale factor 256) that the client seems to fulfil; so perhaps ntttcp is misreporting the send window.
    Further PCAP files have been provided, here:https://www.dropbox.com/s/dtlvy1vi46x75it/iperf%2Bntttcp%2Bftp-pcaps-2014-07-03.zip
    Two more iperfs,
    both from Windows to the same Linux server as before (1.2.3.4): One with a 128k Socket size and default 64k window (restricts to ~5Mbit/s again) and one with a 1MB send window and default 8kb socket size. (scales higher)
    One ntttcp trace
    from the same Windows client to a Server 2012R2 EC2 instance (1.2.3.5). here, the throughput scales well. Note: NTttcp does something odd on port 6001 before it opens the test connection. Not sure what's happening there.
    One FTP data trace, uploading 20MB of /dev/urandom to
    a near identical linux host (1.2.3.6) using Cygwin ncftp.
    Again the limit is there. The pattern is much the same using Windows Filezilla.
    Changing the iperf buffer
    length does make the expected difference to the time sequence graph (much more vertical sections), but the actual throughput is unchanged.
    So we have a final question through all of this: Where is this limitation creeping in? If we simply have user-space software not written to take advantage of Long Fat Networks, can anything be done in the OS to improve the situation?

    Hi,
    Thanks for posting in Microsoft TechNet forums.
    I will try to involve someone familiar with this topic to further look at this issue. There might be some time delay. Appreciate your patience.
    Thank you for your understanding and support.
    Kate Li
    TechNet Community Support

  • One reason why buffer hit ratio just might be bogus

    Hi.
    Still trawling through statspack trying to make some sense of it ...
    But looking at this, (on undo segments) I have perhaps another reason to sneer at the old buffer cache hit ratio: (in the sense of using it to tune, or of using it to justify action / lack of action).
    "The header is also a data block that is frequently modified , so it generally remains in the buffer cache. Therefore, gets of the rollback segment header block increase the buffer cache hit ratio; this artificial increase in the hit ratio can mislead you into thinking that you have allocated enough data blocks to the buffer cache"
    :taken from the OP Exam 033 text, p.295

    Hi Dan,
    The problem is about the usefulness of ratio's in-general, it's not specifically about the BCHR. . . .
    This whole thing arose from the Oracle7 performance tuning classes where Oracle Corporation suggested that the BCHR be kept over 90%. . . .
    Is the BCHR "bogus"? It is what it is, a ratio. . . .
    I agree wit Ben, it does have limited use as an "indicator", that the buffer cache MIGHT be too small to cache the working set of frequently-referenced data blocks. I have my notes here:
    http://www.dba-oracle.com/t_buffer_cache_hit_ratio_value.htm
    Oh, and as to the buffer cache advisory, it's not perfect (as is any predictive model) and I rarely see one that does not suggest that adding RAM will reduce physical I/O.
    Anyway, in just a few years all this will be a moot issue, espeially since solid-state disk has hit $100 per gig. Without spinning platters, there is no need for a large buffer cache at all . . . .
    Hope this helps. . .
    Donald K. Burleson
    Oracle Press author
    Author of "Oracle Tuning: The Definitive Reference":
    http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm

  • Manual calculation of 'Buffer Cache hit ratio'

    Using: Oracle 10.2.0.1.0, Redhat 4, 64bit.
    Manual calculation of ‘Buffer Cache hit ratio’ is very off from what it shown in statspack.
    Statspack shows:
    Instance Efficiency Percentages
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                Buffer Nowait %:  100.00       Redo NoWait %:   99.97
                Buffer  Hit   %:   99.18    In-memory Sort %:  100.00
                Library Hit   %:   90.43        Soft Parse %:   52.56
             Execute to Parse %:   80.14         Latch Hit %:   99.98
    Parse CPU to Parse Elapsd %:   96.21     % Non-Parse CPU:   56.89
    Manual calculation (Got this formula from Sybex PT book, page 275):
    SQL> select name, 1-(PHYSICAL_READS/(DB_BLOCK_GETS+CONSISTENT_GETS))
    from  v$buffer_pool_statistics;
    NAME                 1-(PHYSICAL_READS/(DB_BLOCK_GETS+CONSISTENT_GETS))
    DEFAULT                                                      .700247215Any idea, why using v$buffer_pool_statistics gives wrong results?
    Thanks regards,

    SYS@oradocms11> select (P1.value + P2.value - P3.value)/(P1.value + P2.value)*100
    2 from v$sysstat P1, v$sysstat P2, v$sysstat P3
    3 where P1.name = 'db block gets'
    4 and P2.name = 'consistent gets'
    5 and P3.name = 'physical reads';
    (P1.VALUE+P2.VALUE-P3.VALUE)/(P1.VALUE+P2.VALUE)*100
    99.6977839
    SYS@oradocms11> select name, 1 - (PHYSICAL_READS/(DB_BLOCK_GETS+CONSISTENT_GETS)) from v$buffer_pool_statistics
    NAME 1-(PHYSICAL_READS/(DB_BLOCK_GETS+CONSISTENT_GETS))
    DEFAULT .997009542
    In my case, both are giving almost same result.

Maybe you are looking for

  • I just had a new hard drive installed on my iMac

    i just had a new hard drive installed on my iMac (due to the Seagate replacement program) i was able to migrate my back up(i think) but its now acting like a new computer and i can't get past the registration screen. I filled everything out but the "

  • Problems getting laptop started after ipod load

    I downloaded iTunes from the Apple website and my machine would barely restart. I restarted my computer in Safe Mode and was able to delete the iTunes directory. Problem resolved. I then made a horrible mistake of starting my laptop with the iPod cd

  • Use QuickTime Pro to extract audio from VCD

    Can I use QuickTime Pro to extract the audio from a VCD? after that, I would like to convert that audio file to an mp3.

  • A few how to questions on CaptureDevice: Microsoft.Devices

    Hi. I'm using CaptureDevice to capture videos ( vcDevice = CaptureDeviceConfiguration.GetDefaultVideoCaptureDevice(); 1. How to I set the recording resolution? (I tried setting the canvas/videobrush resolution but the playback is still normal) 2. How

  • PS8 - Licensing has stopped working os Mac OS X 10.10.2 - couldn't pass "solution 2"

    I've tried to follow the "Error "Licensing has stopped working" | Mac OS", but I can't go through "Solution 2: Run the Licence Repair tool", since I get  error /usr/bin/python: can't find '__main__' module in '/Volumes/LicenseRecovery 11.6.1/LicenseR