SOLARIS 2.6 와 HP/UX & SNI 사이의 HANG 문제

제품 : SQL*NET
작성날짜 : 2002-10-16
SOLARIS 2.6 와 HP/UX & SNI 사이의 HANG 문제
======================================
PURPOSE
Solaris 2.6 에서 hang 문제와
In-band and Out-of-band Breaks 에 대해서 알아봅니다.
Explanation
Problem Description
Solaris 2.6 으로 부터 HP/UX & SNI system 으로 접속할 경우
hang 이 걸리는 현상이 발생합니다.
혹은 ora-3113 이 발생하거나 또는 select 시에 hang 이 발생하는
경우가 있습니다.
이 문제의 원인은 un-ack'd urgent data messages 를 다루는 방법에
있어서 Sun 혹은 HP/Pyramid TCP stack 이 잘못된 방법을 사용하기
때문입니다. 현재 이 문제는 Sun & HP 에서 논의 되고 있습니다.
( un-ack'd urgent data messages : SQL*Net 의
out-of-band break/reset mechanism 에 영향을 주어 client 와
server process 를 out of sync 로 만드는 messages 입니다.
break/reset mechanism 은 11-byte normal data packet (NSPTMK)에
선행되는 urgent data message (TCP MSB_OOB flag 가 설정된)에 의해
구현됩니다.)
이러한 현상이 Solaris 에서 HP/UX & SNI 쪽으로 접속할때에 발생하는
이유는 같은 socket 상의 urgent messages 를 다루는 방법이 서로 다르기
때문입니다. HP & SNI 에서는 처음 11-byte의 첫 byte를 잃어버리게 되면
12569: header checksum error 가 발생하게 됩니다.
(이러한 문제는 접속하는 방법상의 차이로 인해
Server Manager 에서는 발생하지 않습니다.)
이 문제는 client 와 server 에서 level 16 으로 trace 를 떠 보면 알 수
있습니다.
Client trace file:
~~~~~~~~~~~~~~~~~~
nsrdr: NSPTDA flags: 0x0
nsrdr: normal exit
nsdo: what=1, bl=2038
nsdo: nsctxrnk=0
nsdo: normal exit
nioqrc: exit
nioqsn: entry
nioqbr: entry
nioqbr: state = normal (0)
nioqsm: entry
nioqsm: Sending break packet (1)...
nsdo: entry
nsdo: cid=0, opcode=67, bl=1, what=18, uflgs=0x100, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420d, mvd=0
nsdo: gtn=127, gtc=127, ptn=10, ptc=2047
nsdo: sending ATTN
nsdo: 127 urgent byte to transport
nsdo: nsctxrnk=0
nsdo: normal exit
nioqbr: exit
nioqrs: entry
nioqrs: state = interrupted (1)
nioqrs: nioqrs: Not in send state ...
nioqsm: entry
nioqsm: Sending reset packet (2)...
nsdo: entry
nsdo: cid=0, opcode=67, bl=1, what=17, uflgs=0x0, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420d, mvd=0
nsdo: gtn=127, gtc=127, ptn=10, ptc=2047
nsdofls: entry
nsdofls: DATA flags: 0x0
nsdofls: normal exit
nsdo: sending NSPTMK packet
nspsend: entry
nspsend: plen=11, type=12
nttwr: entry
nttwr: socket 11 had bytes written=11
nttwr: exit
nspsend: 11 bytes to transport
nspsend: packet dump
nspsend:00 0B 00 00 0C 00 00 00 |........|
nspsend:01 00 02 00 00 00 00 00 |........|
nspsend: normal exit
nsdoacts: entry
nsdofls: entry
nsdofls: DATA flags: 0x0
nsdofls: normal exit
nsdoacts: flushing transport
nttctl: entry
nsdoacts: normal exit
nsdo: nsctxrnk=0
nsdo: normal exit
nioqrs: nioqrs: sucking for reset marker...
nioqar: entry
nioqar: nioqar: suck pipe til I get a reset...
nscontrol: entry
nscontrol: cmd=1, lcl=0x0
nscontrol: normal exit
nscontrol: entry
nscontrol: cmd=3, lcl=0x2
nscontrol: normal exit
nsdo: entry
nsdo: cid=0, opcode=85, bl=0, what=0, uflgs=0x0, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420d, mvd=0
nsdo: gtn=127, gtc=127, ptn=10, ptc=2047
nsdo: switching to application buffer
nsrdr: entry
nsrdr: recving a packet
nsprecv: entry
nsprecv: reading from transport...
nttrd: entry
nttrd: socket 11 had bytes read=11
nttrd: exit
nsprecv: 11 bytes from transport
nsprecv: tlen=11, plen=11, type=12
nsprecv: packet dump
nsprecv:00 0B 00 00 0C 00 00 00 |........|
nsprecv:01 00 02 00 00 00 00 00 |........|
nsprecv: normal exit
nsrdr: got NSPTMK packet
nsrdr: normal exit
nsdo: what=17, bl=1
nsdo: nsctxrnk=0
nsdo: normal exit
nioqar: exit
nscontrol: entry
nscontrol: cmd=2, lcl=0x0
nscontrol: normal exit
nsnactl: entry
nactl_internal: entry
naeectl: entry
naeectl: exit
naecctl: entry
naecctl: exit
nactl_internal: exit
nsnactl: error exit
nserror: entry
nserror: nsres: id=0, op=77, ns=12630, ns2=0; nt[0]=0, nt[1]=0, nt[2]=0
nioqer: entry
nioqer: incoming err = 0
nioqce: entry
nioqce: exit
nioqer: returning err = 0
nioqer: exit
nioqrs: exit
nioqrc: entry
nsdo: entry
nsdo: cid=0, opcode=85, bl=0, what=0, uflgs=0x0, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420d, mvd=0
nsdo: gtn=127, gtc=127, ptn=10, ptc=2047
nsdo: switching to application buffer
nsrdr: entry
nsrdr: recving a packet
nsprecv: entry
nsprecv: reading from transport...
nttrd: entry
client 는 이 부분에서 hang 이 걸립니다.
Server trace file:
~~~~~~~~~~~~~~~~~~
nsdo: switching to application buffer
nsrdr: entry
nsrdr: recving a packet
nsprecv: entry
nsprecv: reading from transport...
nttrd: entry
nttrd: socket 12 had bytes read=10
nttrd: exit
nsprecv: 10 bytes from transport
-<ERROR>- nsprecv: header checksum error
nsprecv:packet hdr
nsprecv:0B 00 00 0C 00 00 00 00 |........|
nsprecv: error exit
nserror: entry
-<ERROR>- nserror: nsres: id=0, op=68, ns=12569, ns2=0; nt[0]=0, nt[1]=0, nt[2]=0
nsrdr: error exit
nsdo: nsctxrnk=0
nsdo: error exit
osnqrc: interrupt came in while in ns...
osnqhp: entry
osnqhp: handling break in state sent (1)
osnqhp: exit
osnqrc: exit
osnqrs: entry
osnqrs: state = interrupted (1)
osnqrs: osnqrs: sending reset marker...
osnqsm: entry
osnqsm: Sending reset packet (2)...
nsdo: entry
nsdo: cid=0, opcode=67, bl=1, what=17, uflgs=0x0, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420c, mvd=0
nsdo: gtn=143, gtc=143, ptn=10, ptc=2048
nsdofls: entry
nsdofls: DATA flags: 0x0
nsdofls: normal exit
nsdo: sending NSPTMK packet
nspsend: entry
nspsend: plen=11, type=12
nttwr: entry
nttwr: socket 12 had bytes written=11
nttwr: exit
nspsend: 11 bytes to transport
nspsend:packet dump
nspsend:00 0B 00 00 0C 00 00 00 |........|
nspsend:01 00 02 00 00 00 00 00 |........|
nspsend: normal exit
nsdoacts: entry
nsdofls: entry
nsdofls: DATA flags: 0x0
nsdofls: normal exit
nsdoacts: flushing transport
nttctl: entry
nsdoacts: normal exit
nsdo: nsctxrnk=0
nsdo: normal exit
osnqsm: exit
osnqrs: osnqrs: sucking for reset marker...
osnqar: entry
osnqar: osnqar: suck pipe til I get a reset...
nsdo: entry
nsdo: cid=0, opcode=85, bl=0, what=0, uflgs=0x0, cflgs=0x3
nsdo: rank=64, nsctxrnk=0
nsdo: nsctx: state=8, flg=0x420c, mvd=0
nsdo: gtn=143, gtc=143, ptn=10, ptc=2048
nsdo: switching to application buffer
nsrdr: entry
nsrdr: recving a packet
nsprecv: entry
nsprecv: reading from transport...
server 는 이 부분에서 hang 이 걸립니다.
Workaround
client 쪽 $ORACLE_HOME/network/admin/sqlnet.ora file 안에 다음
parameter 를 추가 한다.
DISABLE_OOB = ON
DISABLE_OOB = ON 은 SQL*Net이 out-of-band breaks 를 사용하지
않도록 설정하는 기능합니다.
Break 란 ?
break 는 SQL*Net 에서 transaction 이 종료되기 전에
user 에게 transaction 을 정지하게 할 수 있는 function입니다.
interrupt key(대부분의 os 에서 control-C 로 지정되어 있는)를
user 가 눌렀을때 SQL*Net 은 break 를 가동하게 됩니다.
SQL*Net 은 in-band breaks 또는 out-of-band breaks 를 지원합니다.
문제는 OS 환경이 break request 를 지원하는가에 달려 있습니다.
in-band breaks 란 ?
in-band breaks 은 networking protocol 에 의해 제공되는 정상적인
write, read function 을 사용해서 만들어진 client 로 부터
server 로 가는 일반적인 data messages 입니다.
in-band breaks 를 사용하는 oracle server process 의 경우
server 의 RDBMS 는 백만분의 몇초 단위로 작업을 멈추고 break request
가 있는지 없는지 SQL*Net driver 와 함께 검사합니다.
Oracle MTS 의 경우에 in-band breaks 를 사용합니다.
그러나 multi threaded server processes 들은 asynchronously 하게
request 를 하게 때문에 보기에는 out-of-band breaks 를 사용하는것
처럼 보이게 됩니다.
in-band breaks 는 일반 data packet 으로써 server 로 전송되기 때문에
모든 protocol 에서 구현될 수 있습니다.
out-of-band breaks 란 ?
urgent protocol stack data messages 를 사용해서 client 에서 server 로
보내지는 breaks 입니다.
out-of-band breaks 를 사용가능한 모든 platform 과 protocol 에서는
oracle process 에거 client 에서 보내진 break requst 를 알릴 수 있는
방법을 제공합니다.
out-of-band breaks 를 제공하는 server 는 break request 를 검사하기
위해 SQL*Net driver 를 호출하지 않습니다. 대신 os 의 notification
system 를 이용하게 됩니다.
TCP/IP stacks 기반의 다수의 DOS/MS Windows 는 urgent writes를 구현하기
위해 필요한 API calls 을 제공하지 않기 때문에 Windows용 SQL*Net TCP/IP
은 out-of-band breaks 을 사용하지 못합니다.
Example
Reference Documents
Note.68059.1
Note.61444.1

Similar Messages

  • HELP!! -- Solaris 8 10/01 (Intel) Installation Hangs My PC!!

    I downloaded the Solaris 8 for x86 yesterday and burned them into CDs.
    I have the following hardware configuration.
    - Intel i815E mainboard
    - Intel PIII 733
    - KingMax 256M 133MHz
    - Maxtor 30G hard disk
    - Acer 50X CDROM
    The inistallation initially gave me a warning.
    WARNING: ACPI Table not in Reclaim Memory
    And it gave more warning during the installation process:
    Resource Conflicting -- Both devices are added
    The conflicting devices are:
    - PNP0C01
    - ISY0050
    Unfortunately, I don't know what they are. :(
    However, these are not the whole story. The installer went ahead to
    let me fdisk my 30G hard disk.
    I accepted the "Solaris Default" and answered Y to the question "Can
    swap slice start at the beginning of the disk?". Immediately, the
    system hangs there but the keyboard is responsive that I can type
    anything I want.
    In addition, I use the 512M as swap as suggested by the installer.
    Can anyone help me with this situation? I have tried installing this
    version of Solaris for several times but with no luck. :S
    Any insights are highly appreciated! Thanks!
    - Peter

    Try this workaround for your problem and let me know the outcomes.
    Solaris bootstrap uses Advanced Configuration and Power Interface (ACPI)
    for device configuration and MP interrupt routing.
    Try to disable your ACPI.
    Steps to follow:
    1) Use either eeprom(1M):
    # eeprom acpi-user-options=0x2
    ( OR )
    * Modify the Device Configuration Assistant (DCA) boot floppy.
    Manually edit the /boot/solaris/bootenv.rc to add:
    setprop acpi-user-options 0x2
    In this method is applicable for at the time of Solaris 8 installation.
    DCA for Solaris 8 10/01 download it from this URL:
    http://soldc.sun.com/support/drivers/dca_diskettes/
    The ACPI conflicts problem is not fixed and is resolved by disabling ACPI.
    In the example you gave, while it is annoying, it probably does no harm. In
    some cases it causes duplicate devices to show up, like two floppy or IDE
    drives - neither of which work properly. Disabling ACPI is the safest way
    to insure there will be no problems if this message shows up.
    2) Reboot the System.
    Hope it helps.
    Senthilkumar
    Developer Technical Support
    Sun Microsystems, Inc.
    http://www.sun.com/developers/support

  • Solaris 10 x86 06/06 GRUB hangs

    Dear forum,
    I need help regarding Solaris 10 x86 06/06 OS GRUB hangs.
    I had installed Solaris 10 06/06 OS in my SFX4200 server. After installation, i has installed Solaris 10 Recommended patches and rebooted the system. My next tast is to mirror the OS which i did successfully with the guide from SunSolve document id:83605. After the mirroring process i rebooted the system and its booted up successfully.
    Then, i register my Solaris 10 OS using the Sun Update Connection Manager. Next, i click the "Check For Updates" button and i saw a message says analyzing system. Suddenly, my office internet having problem( intermittent ) for 2 hours. Again, i checked Sun Update Connection and its around 50% for 2 hours plus.
    So, i decided to reboot the system. While rebooting the system hangs at "GRUB" word. I tried pressing enter, control c n control d but nothing prompt me and the system cannot boot the OS.
    Please help me to troubleshoot the problem and guide me.
    Thanks in advance.

    Hi Eldar,
    Thanks for your reply.
    I have done with your instruction by booting cdrom and ran the "installgrub" command. I rebooted my server.
    After reboot, i saw grub> word and now i can type command.
    Here are the steps:
    grub> root
    (hd0,0,a): Filesystem type unknown, partition type 0x0
    grub> setup (hd0)
    checking if "/boot/grub/stage1" exists... yes
    checking if "/boot/grub/stage2" exists... yes
    checking if "/boot/grub/ufs_stage1_5" exists... yes
    Running "install /boot/grub/stage1 (hd0) (hd0)1+15 p (hd0,0,a)/boot/grub/stage2 /boot/grub/menu.1st"... failed
    Error 6: Mismatched or corrupt version of stage1/stage2
    grub>
    How can i recover from the above mentioned problem ?
    Note: My hard disk was mirrored before disaster happened.
    Question:
    1. Last time, after i installed OS and mirrored OS, whenever i reboot my server i can see GUI window asking me to choose which to boot but now, i only can see black background and grub>. How can i recover the GRUB GUI, so that whenever system boots, i can choose the options ?
    Need you help.

  • Oracle Solaris 10 1/13 (x86) VM hangs

    Hello,
    I have downloaded Oracle VM Template for Oracle Solaris 10 1/13 (x86) from Oracle Software Delivery Cloud
    and created a VM out of it. When I start it the blue Oracle Solaris screen is displayed in the console and the VM hangs.
    OVS version: 3.2.6
    Hardware: Sun Server X3-2 (Formerly Sun Fire X4170 M3)
    VM config:
    Memory:4GB
    CPU:4
    Disk:40GB

    Yes, but it did not help.
    I did finally manage to boot Solaris 10 1/13 by changing "xvda" to "hda"
    in the vm.cfg. That said, there's still a problem: the OS takes over 20 minutes
    to boot.

  • Sessions hangs with library cache lock

    Der all,
    11.1.0.7 rac on solaris 10
    Our workflow session yesterday hanged on a particular step the session was waiting on a library cache lock (by using query select event,p1,p2 from v$session where sid=<my_sid>;)
    when I checked the blocking session , using the note 122793.1 and http://oracle-study-notes.blogspot.com/2009/05/resolving-library-cache-lock-issue.html and http://oracle-study-notes.blogspot.com/2009/05/find-session-holding-library-cache-lock.html .
    I found that
    SQL> SELECT SID,USERNAME,TERMINAL,PROGRAM FROM V$SESSION
      2   WHERE SADDR in
      3    (SELECT KGLLKSES FROM X$KGLLK LOCK_A
      4     WHERE KGLLKREQ > 0
      5       AND EXISTS (SELECT LOCK_B.KGLLKHDL FROM X$KGLLK LOCK_B
                     WHERE KGLLKSES = '&SADDR_OF_BLKING_SESS'
      6    7                   AND LOCK_A.KGLLKHDL = LOCK_B.KGLLKHDL
      8                   AND KGLLKREQ = 0)
      9    );
    Enter value for saddr_of_blking_sess: 0000000770E494E0
    old   6:                  WHERE KGLLKSES = '&SADDR_OF_BLKING_SESS'
    new   6:                  WHERE KGLLKSES = '0000000770E494E0'
      SID USERNAME        TERMINAL   PROGRAM
      817 SYS             UNKNOWN    oracle@tabsdb07
                                      (J002)
      828 SYS             UNKNOWN    oracle@tabsdb07
                                      (J001)after killing the session, the library cache locks still remained.when I ran trace on the session
    select /*+ all_rows ordered */ A.rowid, :1, :2, :3
    from
    "DBMRPT"."DBM_BIAUTO_SUSP" A , "DBMRPT"."DBM_CDR_FILE_HEAD" B where(
      "A"."CDR_TYPE" is not null and "A"."FILE_ID" is not null) and(
      "B"."CDR_TYPE" (+)= "A"."CDR_TYPE" and "B"."FILE_ID" (+)= "A"."FILE_ID")
      and( "B"."CDR_TYPE" is null or "B"."FILE_ID" is null)
    call     count       cpu    elapsed       disk      query    current        rows
    Parse        1      0.01       0.01          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        0      0.00       0.00          0          0          0           0
    total        2      0.01       0.01          0          0          0           0
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: SYS   (recursive depth: 3)
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      row cache lock                                  5        0.00          0.00
      db file sequential read                   295932636        0.07       5066.63
      gc cr grant 2-way                          727813        0.02        233.95
      latch: gc element                              80        0.00          0.00
      latch: gcs resource hash                      870        0.00          0.00
      latch free                                      2        0.00          0.00
      gc remaster                                     9        2.00         12.91
      gcs drm freeze in enter server mode             9        0.54          2.08
      latch: object queue header operation           66        0.00          0.05
      latch: cache buffers chains                    15        0.03          0.20
      resmgr:internal state change                   63        0.10          5.30
      latch: cache buffers lru chain               1260        0.00          0.01
    ********************************************************************************Please guide
    Kai
    all this time sql_id for the session remanined in the sql :
    ALTER TABLE DBMRPT.DBM_BIAUTO_SUSP ENABLE CONSTRAINT DBS1_DCFH_FK ..

    hi..
    Go through [http://orainternals.wordpress.com/2009/06/02/library-cache-lock-and-library-cache-pin-waits/]
    Anand

  • Copy command hangs

    Hello All,
    I'm having problems copying data between
    oracle 8.1.6.1 dbs, both running on Red
    Hat 6.2.
    I'm using syntax like
    copy from username/password@server -
    insert applications -
    using select * from applications
    This works fine when copying between
    oracle 8.1.6.2 dbs running on solaris,
    but on linux the command hangs and
    starts sucking up CPU.
    Thanks in advance,
    Asif.

    thx Kaj for your quick and helpful reply...
    now i just use xcopy with /Q to not display so much information!!
    but what's the good way to read both stdout and stderr at the same time and store them as useful informations!?
    thx again : )

  • SAPINST hangs during selecing export media for SRM 7.0

    Hi - i'm doing new fresh install of SRM 7.0 SR1 on Solaris Sparc OS platform. Sapinst hangs after providing path for export dvd.  I have deleted all the contents from the installation directory and started from fresh and still hangs on same stage.  While its hanging, there are no logs generating in installation directory.  Any help will be apprecaite it.
    Thanks

    Have you check the SAPDODS.log ? Paste the log ....
    Also if possible change the R3load and dboraslib files ; and try again. But first check the SAPDODS.log
    Regards,
    Neel
    Edited by: Neelabha Banerjee on Mar 4, 2011 8:25 AM

  • Problems installing on COMPAQ 5253 ( Presario )

    I can�t install solaris 8 on my COMPAQ Presario 5253 with follow configuration:
    K6II 450Mhz
    128 MB ram
    8GB Disk
    DVDRom Drive
    I have a partition ( 6GB ) for windows 98. The left space is unpartitioned. I try install by 2 way but both cause problems as following:
    1- Boot from Software CD 1 of 2
    =====================
    Booting by CD display messages about booting Intel Solaris 8, etc..., but hangs with the following message before display the menu for choose the type of installation ( 1-interactive 2-jumpstart):
    Runtime error R6003
    integer divide by 0
    The root filesystem is not mounted and the configuration assistant has exited prematurely.
    Booting is unlikely to succeed. CTR-ALT-DEL may be used to reset the machine.
    Failover to boot interpreter - type ctrl-d to resume boot
    2- Boot from diskette: Device Configuration assistant
    ======================================
    The system boot, recognize all devices and when I select my CD drive to boot and press F2_continue, Solaris start the boot process and when try to load Solaris Kernel hangs with a CORE DUMPED message. This error occurrs exactly after a animated backslash ( /\- animated shapes in loop ) is displayed for some time.
    I don�t know what I can do to install my system. Thanks in advance if anyone can help me...
    Gilsomar N Resende

    I solved both problems as follows:
    1- Boot from software CD 1 of 2
    ========================
    System boot from software CD and after some screens it shows a RUN-TIME error R6003.
    I changed IDE configurations for:
    HD - IDE 1 master
    CD - IDE 2 slave
    Zip - IDE 3 secondary master
    DVD - IDE 4 secondary slave
    In my old configuration, CD was configured as IDE 3 secondary master and ZipDrive as IDE 2 slave. After this change the system boots without problems.
    2- Boot from diskette: Device Configuration assistant
    ======================================
    After solving problem 1, the installation halt at same point ( booting from CD or Diskette ). When Device configuration Assistant shows all devices that my system had, I removed the configuration for the LAN adapter. After this the installation occurs without problems.
    I�m sure that I have problems with the driver for my LAN adapter. I�ll check on compatibility list and buy another that can be used instead.
    GNR

  • Unexpected result from pstack

    Hi All, not sure if this is the right forum......
    I'm maintaining a large(ish) system written in C++ running on Solaris 10. One process occasionaly hangs. We suspect it's waiting on a mutex.
    So - I'd like to use pstack in the shell script that restarts this particular process when the system's monitor decides this process is hung or dead. I'm hoping to find evidence of a call to Mutex_lock(...).
    I have built a simple model. It just kicks off a thread, which immediately hangs ...
    #include <iostream>
    #include <thread.h>
    #include <unistd.h>
    using namespace std;
    mutex_t v_mutex;
    extern "C" void* workerthread(void* v)
       cout << "Worker thread applying for mutex.\n";
       mutex_lock(&v_mutex);
       cout << "Worker thread got mutex.\n";
       mutex_unlock(&v_mutex);
       return 0;
    int main(void)
       thr_create(NULL,
                  0,
                  workerthread,
                  0   ,
                  THR_DETACHED,
                  0);
       mutex_lock(&v_mutex);
       pause();
    }This makes the model process hang on the worker thread's request for the lock. However, in pstack, I don't see the call. (In several articles on the web, the suggestion is that I should be seeing it) This is what I get from pstack...
    10337:     ./a.out
    -----------------  lwp# 1 / thread# 1  --------------------
    ff041104 pause    ()
    00011028 main     (1, ffbff3b4, ffbff3bc, 21400, ff3a0700, ff3a0740) + 38
    00010b38 _start   (0, 0, 0, 0, 0, 0) + 108
    -----------------  lwp# 2 / thread# 2  --------------------
    ff040408 lwp_park (0, 0, 0)                                           <--- I'd expect to see the mutex_lock() here ?
    00010f94 workerthread (0, feefc000, 0, 0, 10f68, 1) + 2c
    ff040368 _lwp_start (0, 0, 0, 0, 0, 0)Just as a sanity check, a session with dbx shows what I was expecting, but this is no good for working from a script...
    Attached to process 10337 with 2 LWPs
    t@1 (l@1) stopped in _pause at 0xff041104
    0xff041104: _pause+0x0004:      ta       %icc,0x00000008
    Current function is main
       34      pause();
    (dbx) thread t@2
    Current function is workerthread
       16      mutex_lock(&v_mutex);                           <--- There it is !
    t@2 (l@2) stopped in __lwp_park at 0xff040408
    0xff040408: __lwp_park+0x0010:  ta       %icc,0x00000008
    (dbx) exitSo does anyone see why I'm not getting the expected result from pstack?
    Thanks very much
    Jeff Adams

    BTW, albeit I don't know your situation in full, I don't think using dbx in a script is impractical. It has -c "command" option and it can read commands from a stream:
    $ dbx - 1234 -c "where -l"
    $ dbx - 1234 < commands.txtFirst command should print stack trace of process with pid=1234 and the second will execute every command from file commands.txt on the same process.

  • Scheduling work from interrupt context?

    Hi all -
    I have a block driver which sits above the normal block
    drivers, and passes requests to them. The issue is as follows:
    In a driver, I maintain a queue of buffers and then set up the
    biodone() callback to process this queue. The issue I'm
    seeing is that if I call the strategy of the lower level block
    dirver from biodone(), it causes hangs. My guess is
    that this because biodone is from interrupt context and the
    strategy routine can block. A stack trace from the hang looks
    like this:
    biowait
    sd_send_scsi_cmd
    sd_send_scsi_TEST_UNIT_READY
    sd_ready_and_valid
    sd_mapblockaddr_iostart
    xbuf_iostart
    sdstrategy
    bdev_strategy
    OUR_BIODONE_HANDLER
    biodone
    sd_return_command
    sdint
    glm_doneq_empty
    glm_intr
    pci_intr_wrapper
    intr_thread
    I'm planning to change the design so that a worker thread
    is used to clear the queue instead of using the biodone()
    callback. Can someone point me to the correct way of
    doing this?
    Further info: problem occurs mostly on Solaris 5.9 sparc. The
    hang is not a sudden hang but the machine slowly grinds to
    a halt, and the stack trace is from mdb, it looked most relevant.
    If this is not the proper forum, any pointers to other contacts in
    sun will be welcome.

    I already had Google selected there. This still uses dot com for the resulting search page that opens, I wanted to have it open with google dot ca.
    I just found this add-on from [http://mycroft.mozdev.org/search-engines.html?name=google.ca MyCroft Project]
    Part way down this page, number 5, is [http://mycroft.mozdev.org/jsreq.html Google.ca Search Bar] . Click the previous link to add this engine to the list of the FireFox search engines (on the drop down menu) did the trick, so I guess I solved my own problem. See attached menu capture. I hope this helps someone else, as it took me quite a while to find. It wasn't found in any of the standard Add-Ons you can find at the usual Mozilla multiple pages of add-ons

  • Network glitch every two minutes

    I'm trying to troubleshoot an issue where by applications sending data between 2 Solaris 10 x86 machines (v40z) will hang for about 1 second every two minutes. The interval between the hang is very regular (2 minutes almost to the milisecond) but it doesn't land on a predicitable second each test (eg, 12:34:52, 12:36:52 in one test, 13:13:20, 13:15:20 in another).
    It appears to be independant of the application. I can see it during long file transfers using scp as well as internal applications that push steady streams of data. The usual checks with prstat and such don't show anything spiking on the nic, cpu, etc. Packet captures don't show anything unusual coming or going at the times of the hangs. It lasts for 1-1.5 seconds.
    The July Recommended set is applied.
    Has anyone else seen anything like this?

    FYI, this ended up being two different situations. In one spot there was a bogus route in the table that was theoretically benign, but removing it made the spikes disappear. The other situation was revolved by removing ipmp from a high-load situation.
    My conclusion is that there is something in the driver or stack that is configured to happen every two minutes which can make itself obvious when in the presence of certain networking misconfigurations. I have not patched and retested the ipmp stuff since the original post, but I don't know that that would necessarily be conclusive if it still existed, what with all the ipmp patches coming out lately.

  • Required packages for sconadm

    Hello,
    Could someone please post a list of required packages to register a LPS using sconadm through an http proxy. I am using a 6/06 sparc build based on the "Reduced Network Core" cluster.
    I cannot find the info in any documentation and not everybody installs every package on the distribution CD's. So far I have found 13 authentication-type packages and their prerequisites. Doesn't that seem a little ridiculous? Why doesn't sconadm's package have the pre-req packages defined?
    Thanks,
    Jeff

    I am a long-time avid Sun evangelist, but I have to say, this issue with the patches have given me the greatest headache I have ever had from Sun.
    I really wish you would go back to allowing us to download at least the minimum recommended clusters until you figure this whole patch management thing out. (ie: 10_Recommended.zip). It was a VERY easy way for us System Administrators to keep all of our systems on the same rev level and on our own schedule.
    Anyways, the lists here seem to be correct except for ONE missing package, at least on the 06/06 release of Solaris 10.
    I was still hanging running the 'sconadm register -a -r /tmp/reg' until I added the package 'SUNWjdmk-base'
    After installing that package along with the others listed above I was able to get my 'SUNWCreq' cluster to install and register using sconadm. I do have other packages installing, so I cannot guarantee that there is not other dependancies.
    Here is a full list of additional packages I'm installing as part of my Jet build (SUNWCreq+):
    base_config_profile_add_packages="SUNWtoo SUNWxwplr SUNWxwfnt SUNWxwplt SUNWxwic
    e SUNWxwrtl SUNWmfrun SUNWj3rt SUNWj5rt SUNWj5rtx SUNWppror SUNWpprou SUNWxcu4 SUNWppro-plugin-sunos-base SUNWccccfg SUNWccccr SUNWccccrr SUNWccsign SUNWcctpx SUNWgm4 SUNWgmake SUNWbindr SUNWbind SUNWctpls SUNWscpu SUNWgtar SUNWsshcu SUNWsshdr SUNWsshdu SUNWsshr SUNWsshu SUNWdoc SUNWman SUNWbash SUNWmysqlr SUNWmysqlu SUNWntpr SUNWntpu SUNWwgetr SUNWgcmn SUNWwgetu SUNWtcpd SUNWbinutils SUNWscpr SUNWpl5u SUNWhea SUNWapchr SUNWapchu SUNWapchd SUNWaclg SUNWopensslr SUNWopenssl-commands SUNWopenssl-include SUNWopenssl-man SUNWsensor SUNWsensorr SUNWscnsomr SUNWscnsom SUNWsamr SUNWsam SUNWscn-base-r SUNWscn-base SUNWcacaort SUNWscnprmr SUNWscnprm SUNWccfw SUNWccfwctrl SUNWccinv SUNWcsmauth SUNWbrg SUNWlur SUNWadmfr SUNWadmfw SUNWadmc SUNWluu SUNWpoolr SUNWpool SUNWluzone SUNWzoner SUNWzoneu SUNWipfh SUNWj3dev SUNWjato SUNWjhrt SUNWjhdev SUNWmctag SUNWtcatu SUNWmconr SUNWmcon SUNWmcosx SUNWmcos SUNWpsvrr SUNWpsvru SUNWjdmk-base"
    I hope this helps!
    Sun, just remember, most of your boxes are "Servers" and any Sys. Admin worth his weight, will not install a User cluster on their server and there are plenty of us small/medium size business that aren't ready to move up to a full N1 implementation and only manage < 20 servers.

  • Installation hang on Solaris 10

    I've been trying to install Solaris 10 on a Compaq(HP) ProLiant ML350 G3, so far without success. During the pre-boot Configuration process, I get a few messages about conflicts between a couple of 'devices' identified only as PNP0A00 and PNP0303. I've read elsewhere that these can be safely ignored ... so I've assumed that's correct and let 'em go.
    When the boot-from-CD process begins, the first (and last) text I see is "Loading Driver ata.bef." At that point the CD activity light stops flashing and the machine hangs hard - it has to be physically powered down. I've seen on other sites that this condition will clear up after 10 or 20 minutes, but in my case it has persisted indefinitely.
    This computer, a ProLiant ML350 G3, has 2 hyperthreaded 2.2Ghz Intel Xeon CPUs, 2.8 GB of memory, q Compaq Smart Array 642 SCSI Raid controller with 4 72GB Drives in Raid 5 (seen as two volumes of 16 and 56 GB), Hitachi GCR 8480-B CD-ROM drive (IDE), ATI Rage XL PCI video, standard PS/2 keyboard and mouse, Compaq NC7760 Gigabit ethernet adapter (Broadcom).
    I really need some help here...Any suggestions (short of trying a different machine altogether)?
    Thank in advance,
    jeff

    Hi Philippe,
    Thanks for your reply.
    I installed the whole suite including directory server in a non-defaut install directory. I am wondering if changing install directory will cause issues.
    --Zhiyuan                                                                                                                                                                                                                                                                                                                                                                                                               

  • How to find out the info regarding Solaris, server hang or shutdown.

    Hi
    As I am handling the Solaris 9 Server remotely.
    As in last week the server has stopped to respond suddenly, we have tried to ping, ssh which was not working. As ultimately we have asked the Data Centre Team to hard reboot the same, to resolve the issue.
    In the same ref I would like to know how we come to know the reason, what caused to reboot the same, i.e . any log file, etc..
    I have check /var/adm/messages, but not found any detail for the same.
    Thanks
    Rajan
    I

    No core files?
    No hope for an answer.
    As you learned in the other Internet forum.
    http://www.linuxquestions.org/questions/solaris-opensolaris-20/how-to-find-out-the-info-regarding-solaris-server-hang-or-shutdown.-621500/
    To get such corefiles analyzed,
    you would need to use your service contract and log a support case with Sun.
    They have the special software tools to do that.

  • Thin driver / 8i / Solaris hangs for 60 seconds

    I am having the same problem that I have also seen in these two messages:
    http://technet.oracle.com:89/ubb/Forum8/HTML/002149.html http://technet.oracle.com:89/ubb/Forum8/HTML/001335.html
    Using the thin driver to connect to Oracle 8.1.6 on Solaris 7, every several hundred operations or so will hang for 60 seconds. A stack trace will reveal it is hung on "socket.read()", most frequently when it is trying to close a Statement but not exclusively. After 60 seconds, or if another database operation on a different connection proceeds, it will wake up and continue until the next freeze. This problem occurs with jdk1.2.2 and jdk1.3 (Sun VMs), several different versions of the thin driver (tried classes111.zip, classes12.zip, classes102.zip, downloaded the most recent the other day), and will happen from a remote server or on localhost. A patch upgrade the other day to 8.1.6 seemed to make the problem less frequent but it persists.

    I am having the same problem that I have also seen in these two messages:
    http://technet.oracle.com:89/ubb/Forum8/HTML/002149.html http://technet.oracle.com:89/ubb/Forum8/HTML/001335.html
    Using the thin driver to connect to Oracle 8.1.6 on Solaris 7, every several hundred operations or so will hang for 60 seconds. A stack trace will reveal it is hung on "socket.read()", most frequently when it is trying to close a Statement but not exclusively. After 60 seconds, or if another database operation on a different connection proceeds, it will wake up and continue until the next freeze. This problem occurs with jdk1.2.2 and jdk1.3 (Sun VMs), several different versions of the thin driver (tried classes111.zip, classes12.zip, classes102.zip, downloaded the most recent the other day), and will happen from a remote server or on localhost. A patch upgrade the other day to 8.1.6 seemed to make the problem less frequent but it persists.

Maybe you are looking for

  • Export/import text with all styles (indesign CS3 (XP SP2))

    I'm looking for the best way to export text so authors can edit their text using MS Word, Excel or xml. (they don't want to use Indesign). after they are done editting I want to import the text back into Indesign (with par, char, obj styles). I know

  • Crm_order_maintain ,changing pricing condition value at item level

    Hi, I am trying to change the pricing condition value at item level while creating sales order through CRM_ORDER_MAINTAIN. I am not able to change the condition value. Code is below.       wa_inputfields-ref_guid   = wa_orderi-guid.       wa_inputfie

  • For FCP forum.. blur face again...I know have read previous but...help

    Posted: Nov 5, 2006 3:52 PM ok am using fcp5.1 on g5 and am very new to fcp...used to use another program but this is different... placed the original file on v1 and a copy on v2. have hidden v1 so i don't get confused. just need to block out a small

  • "No valid material component found" when Routing upload using LSMW RCPTRA02

    I want to upload routings through LSMW RCPTRA02 incluing routing headers, sequences, operations and component allocations. But systems appears error CM 101 "No valid material component found" . Does anybody occurs the same error before? The materials

  • Error FRM-92102

    Hi, We are having our forms upgraded to Oracle 10g and there is one external text editor to edit text items. After calling that editor from application we kept both application and editor idle for 30 minutes. Then once the editor is closed and clicke