Icmp poller on solaris 10: lot's of unexplained ping failed

Hello to everybody, and if someone can help me on the following problem...
I use the IBM tivoli netcool network manager icmp poller on a solaris sparc local zone server, and for a reason I don't understand, regularly, I have lot's of unexplained ping failed.
Note that the poller is configured to poll about 9000 Ip address every 4 minutes. And often, some of the IP address (but generally not the same) don't reply to the icmp request for 2-10 seconds max. And it seems that the tool is not the problem because, when I test myself with the ping command on concerned IP address, I have effectively the following message for only few seconds:
"icmp host unreacheable from gateway yvasl110" (yvasl110 is the name of the local zone server)
I notice that the ipOutNoRoutes increases very often on this specific server:
netstat -s -P ip
IPv4 ipForwarding = 2 ipDefaultTTL = 255
ipInReceives =3113915 ipInHdrErrors = 0
ipInAddrErrors = 0 ipInCksumErrs = 0
ipForwDatagrams = 0 ipForwProhibits = 187
ipInUnknownProtos = 3 ipInDiscards = 3495
ipInDelivers =4391757 ipOutRequests =3059887
ipOutDiscards = 0 ipOutNoRoutes =117387
ipReasmTimeout = 15 ipReasmReqds = 0
ipReasmOKs = 0 ipReasmFails = 0
ipReasmDuplicates = 0 ipReasmPartDups = 0
ipFragOKs = 0 ipFragFails = 0
ipFragCreates = 0 ipRoutingDiscards = 0
tcpInErrs = 0 udpNoPorts = 4495
udpInCksumErrs = 0 udpInOverflows = 0
rawipInOverflows = 0 ipsecInSucceeded = 0
ipsecInFailed = 0 ipInIPv6 = 0
ipOutIPv6 = 0 ipOutSwitchIPv6 = 0
Note that I have another same server with the same tool and same list of IP address, and I have no problem: no ping failed and ipOutNoRoutes = 0
I have already analyzed the network connexion (and already switched on another network connexion: network card+switch) = no effect.
I install the last Solaris patch = no effect.
And tcp solaris parameters are the same on the 2 servers.
So, I don't understand. :o(

Solaris ships with NTP code that is probably more than a decade old. xntpd is version 3 (probably with some patches by Sun), but ntp version 4 has been out for years.
That said, even the ancient version 3 stuff is usually functional, so the fact that yours isn't working seems somewhat odd.
But if you can't restart NTP, then it'll be difficult to debug. Looks like it's running, but has no servers configured. Possibly at the time the machine booted, the names could not be resolved? So NTPD came up with no servers listed. Just a guess.
Darren

Similar Messages

Understanding icmp polling interval in NMS

Hi all,
I am basing my question on the logic that a device can be polled successfully at 09:54:00, go off-line at 09:54:32 and return on-line again at 09:54:98 in time for the next NMS poll and this would show no down time at all.
What I am seeing is the opposite, in that I think the device is being polled and is failing to respond, then it is coming back on-line x seconds later and is then being successfully polled and the NMS is reporting a down time of 49 seconds (the difference between when it physically came back on-line and when it was next polled). Does the SNMP Agent have some internal mechanism to retain the up/down time duration so that it can be reported back to the NMS when it is back on-line and successfully being polled, or is the value of 49 seconds down-time duration being generated by the NMS system as to how long ago it was since it was successfully able to poll the device. If it is the latter why would this be 49 seconds if the attempts to poll interval is every 60 seconds?
Problem Specifics
My network monitoring is set up to poll every 60 seconds and has an icmp threshold set of 2 seconds.
As I understand it, this means that every 60 seconds a device will be sent a ping and it has up to 2 seconds with which to respond otherwise it is classed as down.
Looking at the reports I can see that a router did not respond and shows a down duration of 49 seconds, before reporting up time of 1 minute in the next entry. What I am trying to work out is how it got to this 49 second duration value?
If the NMS was set to ping the device every 60 seconds and the first ping occurred at 09:54:00 and succeeded, the next poll would be at 09:55:00. Lets say that the 09:55:00 icmp poll was unsuccessful. This is then reported back to the NMS as down. It would then try and poll again at 09:56 to see if the device is back on-line and in my case it was, only the total downtime shows as 49 seconds
Should this not always be 1 minute or multiples thereof?
Could it be that the NMS system once detecting that a device is down, will then step up it’s polling to that down device with repeat polls for a specified period until it gives up and reverts back to it’s regular schedule. I.e. it pings every second and on the 49th second it got a response?
Thanks in advance
David

Hi,
Could you let me know how frequent the file gets loaded to the FTP server.
Is your file size too large to get loaded.. If so try to increase the polling interval. Also make sure that you have provided the polling time in sec and not in minutes...i.e for 2 minutes it should have been specified as 120 sec as polling interval
Regards,
Nithiyanandam

Install NetConnect in Global Zone of Solaris 10 x86 with 5 local zone fail

Problem:
Install NetConnect in Global Zone of Solaris 10 x86 with 5 local zone failed
But I didn't have any issue with the same pkg in Solaris 9
Steps taken:
# groupadd netcon
# useradd -d /export/home/netcon -g netcon -m netcon
# vi /etc/shadow
change LK to NP
# ./UninstallNetConnect.003.002.001.sh
# ./InstallNetConnect.003.002.001.sh
Enter the user account to use: netcon
Enter group: netcon
Installing Sun(SM) Net Connect Proxy Core as <SUNWsrspx>
## Installing part 1 of 1.
/etc/opt/SUNWsrspx/CustomerCert.pem
/etc/opt/SUNWsrspx/SRSCACert.pem
/etc/opt/SUNWsrspx/binaries
/etc/opt/SUNWsrspx/srsproxyconfig.cfg
/opt/SUNWsrspx/bin/srsexec
/opt/SUNWsrspx/bin/srsinstall
/opt/SUNWsrspx/bin/srsinstallmode
/opt/SUNWsrspx/bin/srsproxy
/opt/SUNWsrspx/bin/srspxrun
/opt/SUNWsrspx/bin/srspxstat
/opt/SUNWsrspx/bin/srspxtrace
/opt/SUNWsrspx/bin/srsuser
/opt/SUNWsrspx/bin/srsxfer
/opt/SUNWsrspx/lib/srsimapi.jar
/usr/lib/libsrsimapi.so.1
[ verifying class <none> ]
## Executing postinstall script.
copying initial install customer cert into place
copying initial install srs cert into place
copying initial install proxy config file into place
removing any existing uninstallscript before copying the correct one
copying uninstall script into place
/var/sadm/pkg/SUNWsrspx/install/postinstall: /opt/SUNWsrspx/bin/srspxrun: cannot execute
proxy queue initialization failed
pkgadd: ERROR: postinstall script did not complete successfully
Installation of <SUNWsrspx> failed.
ERROR: pkgadd failed for: SUNWsrspx
Please correct this situation and rerun the installation.
Exiting installation.
# cat /var/adm/messages
Oct 7 00:00:01 planet root: [ID 702911 daemon.error] ERROR: proxy queue initialization failed
Any clue, or it will only possible in Global zone that didn't have any local zone
thanks in advance

Problem:
Install NetConnect in Global Zone of Solaris 10 x86
with 5 local zone failed
But I didn't have any issue with the same pkg in
Solaris 9
Steps taken:
# groupadd netcon
# useradd -d /export/home/netcon -g netcon -m netcon
# vi /etc/shadow
change LK to NP
# ./UninstallNetConnect.003.002.001.sh
# ./InstallNetConnect.003.002.001.sh
Enter the user account to use: netcon
Enter group: netcon
Installing Sun(SM) Net Connect Proxy Core as
<SUNWsrspx>
## Installing part 1 of 1.
/etc/opt/SUNWsrspx/CustomerCert.pem
/etc/opt/SUNWsrspx/SRSCACert.pem
/etc/opt/SUNWsrspx/binaries
/etc/opt/SUNWsrspx/srsproxyconfig.cfg
/opt/SUNWsrspx/bin/srsexec
/opt/SUNWsrspx/bin/srsinstall
/opt/SUNWsrspx/bin/srsinstallmode
/opt/SUNWsrspx/bin/srsproxy
/opt/SUNWsrspx/bin/srspxrun
/opt/SUNWsrspx/bin/srspxstat
/opt/SUNWsrspx/bin/srspxtrace
/opt/SUNWsrspx/bin/srsuser
/opt/SUNWsrspx/bin/srsxfer
/opt/SUNWsrspx/lib/srsimapi.jar
/usr/lib/libsrsimapi.so.1
[ verifying class <none> ]
## Executing postinstall script.
copying initial install customer cert into place
copying initial install srs cert into place
copying initial install proxy config file into place
removing any existing uninstallscript before copying
the correct one
copying uninstall script into place
/var/sadm/pkg/SUNWsrspx/install/postinstall:
/opt/SUNWsrspx/bin/srspxrun: cannot execute
proxy queue initialization failed
pkgadd: ERROR: postinstall script did not complete
successfully
Installation of <SUNWsrspx> failed.
ERROR: pkgadd failed for: SUNWsrspx
Please correct this situation and rerun the
installation.
Exiting installation.
# cat /var/adm/messages
Oct 7 00:00:01 planet root: [ID 702911 daemon.error]
ERROR: proxy queue initialization failed
Any clue, or it will only possible in Global zone
that didn't have any local zone
thanks in advance
[I believe that it does not work in x86, watches with pkginfo - l SUNWsrspx, in the platform.
Willy Suarez
Sopport UNIX
Colombia/code]

ICMP socket in solaris 9

Hi
i'm trying to create an alternative ping command for my application in order to not need to make a system calll to run a ping.
after creating this ping routine, I tryed to test in solaris 10 and found an permission denied error. After looking for some explanation, I found a way to enable icmp privileges with the command
usermod -K defaultpriv=basic,net_icmpaccess user
but I can't find any way to enable this privilege under solaris 9, because the syntax is quite different, and I always get errors when trying.
can anyone help me?
best regards.
Ilde.

There is a lot to be taken into consideration before desiding on your backup strategy, but if you're a little confused with different options (ufs snaps, jumpstart, good old ufsdump, etc.), the choice is still simple -- ufsdump/ufsrestore are the tools you would use to perform regular backups. UFS snapshots can not be considered a reliable backup, since it depends on the reliability of the filesystem -- you loose the filesystem, you loose the snapshot and therefore to be a true backup it still needs to be backed up to an external source. I would not call flash archives and jumpstart a real backup either, since the real purpose of either is to have easy deployable system images to maintain a large number of systems. I'm sure it is possible to concoct a backup system relying on flash archives or jumpstart, but it would be quite combersome compared to true backup solutions. If you're looking for a free backup solution on Solaris, a way to go would be either ufsdump/ufsrestore or possibly open-source Amanda backup suite if you have a more complicated setup with tape loaders.

Windows Vista and Solaris...10...install failed...help.

Hi everyone!!
i installed Solaris OS twice on my system
i have vista previously on this...
i made a partition for sol...
after installation solaris boots fine
but windows is totally corrupted and doestn boot normally
ill have to erase the whole computers HD and then reinstall it again
both the OSs are not working at the same time
with solaris already installed if i try to repair or re install windows vista its erasing solaris partition totally
please help!!! what should i do to get my system running with both OS s.
any replies.....much awaited
thank u so much in advance....
-sriya
Message was edited by:
Sriya
i have dell inspiron...just delivered yday...640m
Message was edited by:
Sriya

Install Windows first, then Solaris {not the other way round}. When Solaris installer detects Windows installation and prompts you whether to preserve Windows partition, make sure to select 'preserve' option. I believe Solaris installer creates appropriate GRUB entries for Windows and Solaris. Even if it doesn't, you can always edit the GRUB menu after the installation is complete.
Check the following web site if you need detailed instructions:
http://multiboot.solaris-x86.org/index.html

Upgrade from Solaris 8 (5.8) to 5.10 fails

Hi everyone...
Here is a curly one for the Sun gurus.
I attempted to perform the upgrade from Sol 8 (Dec/02) to Sol 10 (from the CD/DVD Jan/06) and got all the way past the "Test Upgrade Profile". The initialize started after I clicked onto the gui "Install Now" button. However, it fails after an hour or so - each an every time.
We have the 4.2.1 DiskSuite defined on a 3310 disk array and well as the 2 system disks - which are striped and mirrored. I can not do a Live Upgrade as I don't have a spare disk, so Sun Support in India suggested I do the upgrade in place...but is crashs attempting to copy files from the CD to the Server, as well as , so other strange error messages?
The first attempt was to leave vfstab untouched, the 2nd was to comment out the disk array (D30, D31, & D32) devices. But both fail. The install logs will be uploaded.
vfstab:
=============>
#device          device          mount          FS     fsck     mount     mount
#to mount     to fsck          point          type     pass     at boot     options
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr          ufs     1     yes     -
fd     -     /dev/fd     fd     -     no     -
/proc     -     /proc     proc     -     no     -
/dev/md/dsk/d1     -     -     swap     -     no     -
/dev/md/dsk/d0     /dev/md/rdsk/d0     /     ufs     1     no     logging
/dev/md/dsk/d3     /dev/md/rdsk/d3     /var     ufs     1     no     logging
/dev/md/dsk/d4     /dev/md/rdsk/d4     /archive     ufs     2     yes     logging
/dev/md/dsk/d7     /dev/md/rdsk/d7     /export/home     ufs     2     yes     logging
/dev/md/dsk/d6     /dev/md/rdsk/d6     /usr/openv     ufs     2     yes     logging
/dev/md/dsk/d30     /dev/md/rdsk/d30     /software     ufs     2     yes     logging
/dev/md/dsk/d31     /dev/md/rdsk/d31     /databases     ufs     2     yes     logging
/dev/md/dsk/d32     /dev/md/rdsk/d32     /spare     ufs     2     yes     logging
swap     -     /tmp     tmpfs     -     yes     -
=================<
The install_log show weird errors that stop the upgrade.
See below
==========>
Error opening file /a/var/sadm/system/admin/CLUSTER.
getInstalledPkgs: Unable to access /a/var/sadm/pkg
copyOldClustertoc: could not copy /a/var/sadm/system/admin/.clustertoc to /tmp/clustertocs/.old.clustertoc
Error:
Error: ERROR: The specified root and/or boot was not found or was not upgradeable
Pfinstall failed. Exit stat= java.lang.UNIXProcess@20ca8b 2
word must be specified if an upgrade with disk space reallocation is required
Processing profile
Checking c2t0d0s0 for an upgradeable Solaris image
     Unable to start Solaris Volume Manager for unknown, c2t0d0s0 is not upgradeable
ERROR: The specified root and/or boot was not found or was not upgradeable
==================<
Now initially I had to add a number of packages/patches via the note 72099 that was displayed at the very beginning of the Sol 10 upgrade. Also, do and check the necessary bits via document 16141-1.
But the upgrade just fails. I would like to keep the disk array and its data intact for the upgrade as it has Oracle 11i Apps databases and its a bit of work redefining the metastat and loading Oracle, and the other bits it needs, as well as Netbackup etc.
So, are there are any bright sparks out there that can help?
Please email me.
Cheers
Roger Sager
Oracle 11i Apps DBA
Sydney, Australia

Upload of final install log file (as included in 1st topic discussion)

Solaris 10 x86 daylight savings time patch failes

Hello! I'm having trouble getting my solaris box to recognize the new timezone change. I've installed patches 122033-04 and 121208-03 as you can see here:
$ showrev -p | fgrep 122033
Patch: 122033-04 Obsoletes: Requires: Incompatibles: Packages: SUNWcsu
$ showrev -p | fgrep 121208
Patch: 121208-02 Obsoletes: 118345-13, 118849-01, 120018-02 Requires: 118844-22 Incompatibles: Packages: SUNWcsu, SUNWcsr, SUNWcsl, SUNWtoo, SUNWcslr, SUNWhea, SUNWbtool
Patch: 121208-03 Obsoletes: 118345-13, 118849-01, 120018-02, 118565-03 Requires: 118844-14, 118844-22 Incompatibles: Packages: SUNWcsu, SUNWcsr, SUNWcsl, SUNWtoo, SUNWcslr, SUNWhea, SUNWbtool
$ uname -a
SunOS icarus 5.10 Generic_118844-26 i86pc i386 i86pAfter the reboot, the time from the date command jumped forward one hour, but the timezone still shows "CDT", not "CST" for central time.
This problem also seems to affect time calculations in my java version "1.5.0_06" environment.
Does anyone have an idea what could be happening? Thanks for any help.
Joe Gonzalez

Hi Celerius,
I thought that Central Daylight Time would be the correct timezone since DST began last Sunday? CST is the winter timezone.
What might be impacting your java time calculations is a problem which we noticed with java 1.5.0_06, where there is an issue with time zones being incorrectly mapped. We found that the use of 1.5.0_04 was more stable. Maybe you can try that on your test platform to see if it helps.
Cheers.

Solaris 10 x86 1/06 CD1 boot fail

Hello all.
I download last version from sun.com, unzip iso's and burn their to cd's.
When i try to load (on different machines) from first cd i receive this errors:
<div class="pre"><pre>
/kernel/fs/specfs: undefined symbol ''
/kernel/fs/specfs: undefined symbol ''
... several screens same message ...
/kernel/fs/specfs: undefined symbol '' WARNING: mod_load: cannot load module 'specfs'
Press any key to reboot...
</pre></div>
2 and 3 variants of installation stoping without messanges.
First cd data reading is fine.
Where is my problem?

I'd compute MD5 checksums for the files you've downloaded and
verify if they match known good checksums for the downloadable files:
http://whacked.net/2005/12/22/solaris-10-update-1-s10u1-md5- sums/
Maybe there has been some sort of data corruption with the files you've
downloaded / unzipped on your machine....

Solaris 10 sshd + GSSAPI auth appears to fail with long usernames.

Solaris 10 sshd using GSSAPI mode appears to fail with long usernames.
We have recently jumbo-patched solaris 10 server and windows 2k3 kerberos kdc. We wish to provide the single sign on thing for our Windows users, as written up in http://220-245-28-18.static.tpgi.com.au/~irvinee/gssapi-sol10/gssapi-howto.html
Everything is fine, until a user with a ten character username comes along. The ten character username does not get the single sign on experience
However, he can kinit fine on Solaris 10 server and also on other unix clients.
If I switch from the stock solaris 10 sshd to a self-compiled OpenSSH linked against MIT Kerberos, the 10 char username gets single-sign-on and all is well..
I note I have no problem when the server is FreeBSD 6.2 and the client is stock solaris 10 ssh.
It seems to be the Solaris 10 sshd only that is affected. Before I write up a bug report, has anyone else come across the same problem?

I finally got it working. I think my problem was that I was coping and pasting the /etc/pam.conf from Gary's guide into the pam.conf file.
There was unseen carriage returns mucking things up. So following a combination of the two docs worked. Starting with:
http://web.singnet.com.sg/~garyttt/Configuring%20Solaris%20Native%20LDAP%20Client%20for%20Fedora%20Directory%20Server.htm
Then following the steps at "Authentication Option #1: LDAP PAM configuration " from this doc:
http://docs.lucidinteractive.ca/index.php/Solaris_LDAP_client_with_OpenLDAP_server
for the pam.conf, got things working.
Note: ensure that your user has the shadowAccount value set in the objectClass

ORA-12560: TNS:protocol adapter error..checked a lot of forums but it fails

I checked a lot of forums out here but no satisfying answers.
I'll go with the summary again.
I have Windows XP.
First I installed Oracle 9i on E drive for practice purpose. The installation was done successfully. I opened Internet explorer. Typed "http://hash/isqlplus" in the address bar and pressed enter. The page displayed and I logged in. Logging in was successful.
Now I installed the Developer Suite on the C drive. Installation was complete. Everything seemed fine but when I tried logging into my isqlplus again it started giving me an error saying "ORA-12560: TNS:protocol adapter error"
I have no idea how to go ahead with this.
Will be thankful if someone could clear this up.
Thanks.
Hashem.

If you search more, you would have seen loads of answers.
You now have more than one ORACLE_HOME and your latest install will be the current default ORACLE_HOME.
If your have not set TNS_ADMIN, all Oracle Installations on your server will be looking for the Listener (and tnsnames) on your default ORACLE_HOME which I guess will be the ORACLE_HOME of the Developer Suite.
So set TNS_ADMIN on th environment settings on the server to point to the correct place

MySAP 2005 - QM Module - QE51N - Partial Lot UD Follow Up Actions Failing

Hi,
I am currently working on a project involving an upgrade from SAP 4.5B to mySAP 2005.
Here is our situation:
We deal with partial lots / inspection points.
Whenever we record results for each partial lot and valuate, a follow up action is supposed to execute.
The Partial Lot Pop up box appears as expected from SAP standard functionality and the partial lot valuation shows as well as the follow up action that is supposed to execute.
But the follow up action is not executing for the partial lots.
We also have another follow up action that executes after a Usage Decision is made on the overall Inspection Lot and that executes.
So i'm not sure if it a mySAP issue.
We performed this testing using the QE51N transaction.
We used QE51, and the partial lot follow up actions are executing so not sure what is happening in QE51N.
Can anyone please help?

Hi Ram,
Selected Sets (QS55):
- Firstly I have defined Catalog type 3 selected sets for usage decisions.
- Within the selected set I have Accept / Reject codes defined with appropriate
follow up actions.
IMG Config:
Quality Management --> Inspection Planning --> General --> Define Identifier for Inspection Points.
In that area, i have my default valuations set where all characteristics are accepted or when one is rejected.
Quality Management --> Quality Inspection --> Inspection Lot Completion --> Define Follow-Up Actions
I have defined follow up actions here and my follow up functions are set to "Usage decision for partial lot"
I believe all settings are correct since we upgraded our 4.5B SAP test box to mySAP 2005 and overall things were maintained.
So not too sure what is happening.
No error occurs. The partial lot follow up action doesn't even execute in QE51N.
I went into the code of the function and set a temporary breakpoint there to pause execution.
in QE51N, nothing happened so I could tell the follow up action wasn't even executed.
in QE51 however, the follow up action executed and the break point kicked in execution paused with a debug session.

Writing /dev/poll application in Solaris 8

Hi all,
I am trying to implement /dev/poll based polling in solaris 8 . I got the
sample code form solaris 2.8 man pages for /dev/poll .
I am able to compile the program but when I try to run it the ioctl
call fails giving an error of "Invalid Argument" .
What could be the reason for the ioctl faliure . Any help or pointers
will be greatly appreciated.
regards
Rajesh K
Section of the code .
nt i=0 ,a=0,clen;
int j=0,pret=0;
struct sockaddr_in serv , cli ;
struct pollfd pfd[2];
/* struct dvpoll dopoll;
dvpoll_t dopoll;
int wfd;
while(1)
dopoll.dp_timeout = -1;
dopoll.dp_nfds = 2;
dopoll.dp_fds =pfd ;
errno=0;
if((pret = ioctl(wfd ,DP_POLL ,&dopoll)) < 0)
perror("/dev/poll ioctl DP_POLL error");
printf(" Errno = %d \n",errno);
exit(0);
}

pfd is a array of pollfd structures and has been initialised to the
file descriptors that has to be polled. The write to /dev/poll of that structure is also successful.
It looks like that ioctl is not able to recogonise this device may have to configure its device number but how I dont know.

ICMP Timeout Alarm due to TCP Protocol Memory Allocation Failure ?

Hello Experts ,
>> Device uptime suggests there was no reboot
ABCSwitch uptime is 28 weeks, 13 hours, 50 minutes
System returned to ROM by power-on
System restarted at 13:09:45 UTC Mon Aug 5 2013
System image file is "flash:c2950-i6k2l2q4-mz.121-22.EA12.bin"
>> But observed logs mentioning Memory Allocation Failure for TCP Protocol Process ( Process ID 43) due to Memory Fragmentation
003943: Feb 18 02:14:27.393 UTC: %SYS-2-MALLOCFAIL: Memory allocation of 36000 bytes failed from 0x801E876C, alignment 0
Pool: Processor Free: 120384 Cause: Memory fragmentation
Alternate Pool: I/O Free: 682800 Cause: Memory fragmentation
-Process= "TCP Protocols", ipl= 0, pid= 43
-Traceback= 801C422C 801C9ED0 801C5264 801E8774 801E4CDC 801D9A8C 8022E324 8022E4BC
003944: Feb 18 02:14:27.397 UTC: %SYS-2-CFORKMEM: Process creation of TCP Command failed (no memory).
-Process= "TCP Protocols", ipl= 0, pid= 43
-Traceback= 801E4D54 801D9A8C 8022E324 8022E4BC
According to Cisco documentation for Troubleshooting Memory issues on Cisco IOS 12.1 (http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/6507-mallocfail.html#tshoot4 ), which suggests the TCP Protocols Process could not be started due to Memory being fragmented
Memory Fragmentation Problem or Bug
This situation means that a process has consumed a large amount of processor memory and then released most or all of it, leaving fragments of memory still allocated either by this process, or by other processes that allocated memory during the problem. If the same event occurs several times, the memory may fragment into very small blocks, to the point where all processes requiring a larger block of memory cannot get the amount of memory that they need. This may affect router operation to the extent that you cannot connect to the router and get a prompt if the memory is badly fragmented.
This problem is characterized by a low value in the "Largest" column (under 20,000 bytes) of the show memory command, but a sufficient value in the "Freed" column (1MB or more), or some other wide disparity between the two columns. This may happen when the router gets very low on memory, since there is no defragmentation routine in the IOS.
If you suspect memory fragmentation, shut down some interfaces. This may free the fragmented blocks. If this works, the memory is behaving normally, and all you have to do is add more memory. If shutting down interfaces doesn't help, it may be a bug. The best course of action is to contact your Cisco support representative with the information you have collected.
>>Further TCP -3- FORKFAIL logs were seen
003945: Feb 18 02:14:27.401 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003946: Feb 18 02:14:27.585 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003947: Feb 18 02:14:27.761 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003948: Feb 18 02:14:27.929 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003949: Feb 18 02:14:29.149 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
According to Error Explanation from Cisco Documentation (http://www.cisco.com/c/en/us/td/docs/ios/12_2sx/system/messages/122sxsms/sm2sx09.html#wp1022051)
suggests the TCP handles from a client could not be created or initialized
Error Message %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
Explanation The system failed to create a process to handle requests from a client. This condition could be caused by insufficient memory.
Recommended Action Reduce other system activity to ease memory demands.
But I am still not sure about the exact root cause is as
1.The GET/GETNEXT / GET BULK messages from SNMP Manager (Here, IBM Tivoli Netcool ) uses default SNMP Port 161 which is
   UDP and not TCP
2. If its ICMP Polling failure from IBM Tivoli Netcool , ICMP is Protocol Number 1 in Internet Layer of TCP/IP Protocol Suite and TCP is Protocol                 Number 6 in the Transport Layer of TCP/IP Protocol Suite .
So I am still not sure how TCP Protocol Process Failure could have caused ICMP Timeout . Please help !
Could you please help me on what TCP Protocol Process handles in a Cisco Switch ?
Regards,
Anup

Hello Experts ,
>> Device uptime suggests there was no reboot
ABCSwitch uptime is 28 weeks, 13 hours, 50 minutes
System returned to ROM by power-on
System restarted at 13:09:45 UTC Mon Aug 5 2013
System image file is "flash:c2950-i6k2l2q4-mz.121-22.EA12.bin"
>> But observed logs mentioning Memory Allocation Failure for TCP Protocol Process ( Process ID 43) due to Memory Fragmentation
003943: Feb 18 02:14:27.393 UTC: %SYS-2-MALLOCFAIL: Memory allocation of 36000 bytes failed from 0x801E876C, alignment 0
Pool: Processor Free: 120384 Cause: Memory fragmentation
Alternate Pool: I/O Free: 682800 Cause: Memory fragmentation
-Process= "TCP Protocols", ipl= 0, pid= 43
-Traceback= 801C422C 801C9ED0 801C5264 801E8774 801E4CDC 801D9A8C 8022E324 8022E4BC
003944: Feb 18 02:14:27.397 UTC: %SYS-2-CFORKMEM: Process creation of TCP Command failed (no memory).
-Process= "TCP Protocols", ipl= 0, pid= 43
-Traceback= 801E4D54 801D9A8C 8022E324 8022E4BC
According to Cisco documentation for Troubleshooting Memory issues on Cisco IOS 12.1 (http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/6507-mallocfail.html#tshoot4 ), which suggests the TCP Protocols Process could not be started due to Memory being fragmented
Memory Fragmentation Problem or Bug
This situation means that a process has consumed a large amount of processor memory and then released most or all of it, leaving fragments of memory still allocated either by this process, or by other processes that allocated memory during the problem. If the same event occurs several times, the memory may fragment into very small blocks, to the point where all processes requiring a larger block of memory cannot get the amount of memory that they need. This may affect router operation to the extent that you cannot connect to the router and get a prompt if the memory is badly fragmented.
This problem is characterized by a low value in the "Largest" column (under 20,000 bytes) of the show memory command, but a sufficient value in the "Freed" column (1MB or more), or some other wide disparity between the two columns. This may happen when the router gets very low on memory, since there is no defragmentation routine in the IOS.
If you suspect memory fragmentation, shut down some interfaces. This may free the fragmented blocks. If this works, the memory is behaving normally, and all you have to do is add more memory. If shutting down interfaces doesn't help, it may be a bug. The best course of action is to contact your Cisco support representative with the information you have collected.
>>Further TCP -3- FORKFAIL logs were seen
003945: Feb 18 02:14:27.401 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003946: Feb 18 02:14:27.585 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003947: Feb 18 02:14:27.761 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003948: Feb 18 02:14:27.929 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
003949: Feb 18 02:14:29.149 UTC: %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
-Traceback= 8022E33C 8022E4BC
According to Error Explanation from Cisco Documentation (http://www.cisco.com/c/en/us/td/docs/ios/12_2sx/system/messages/122sxsms/sm2sx09.html#wp1022051)
suggests the TCP handles from a client could not be created or initialized
Error Message %TCP-3-FORKFAIL: Failed to start a process to negotiate options.
Explanation The system failed to create a process to handle requests from a client. This condition could be caused by insufficient memory.
Recommended Action Reduce other system activity to ease memory demands.
But I am still not sure about the exact root cause is as
1.The GET/GETNEXT / GET BULK messages from SNMP Manager (Here, IBM Tivoli Netcool ) uses default SNMP Port 161 which is
   UDP and not TCP
2. If its ICMP Polling failure from IBM Tivoli Netcool , ICMP is Protocol Number 1 in Internet Layer of TCP/IP Protocol Suite and TCP is Protocol                 Number 6 in the Transport Layer of TCP/IP Protocol Suite .
So I am still not sure how TCP Protocol Process Failure could have caused ICMP Timeout . Please help !
Could you please help me on what TCP Protocol Process handles in a Cisco Switch ?
Regards,
Anup

Oracle 9i install failing on Solaris 10

Oracle 9i install on Solaris 10 fails at the Database creation stage with Ora error "Out of Memory"
I have a Ultra 10 with 1GB RAM, 2GB swap and lot of space in the filesystem where I am installing Oracle
The oracle release notes asks for increased value for shmmax, shmmni etc by editing /etc/system and according to Solaris 10, it is not required as kernel dynamically allocates those values. It looks like it is NOT. How can I verify?
rcladm talks about shmmax values only with projects. How can I verify that all my applications running on a this box has a higher SHMMAX values and not on per project basis.
In older versions, sysdef could tell us the current SHMMAX values. Is there any comparable commands in Solaris 10.
I am not that comfortable with the rcladm features. Can some one show some examples.
Is anyone successful in installing Oracle9i on Solaris 10. I heard Oracle patch set 9.2.0.5 and 9.2.0.6 is certified with Solaris 10. Does it mean the original version of 9.2 does not install on Solaris 10?

'Net Configuration Assistant is failing" from installation. I tried twice and got the same problem. I tried to run 'netca' from the command line and got the same problem - Java runtime problem. I tried to install patch and could not find Oracle patches for Solaris 10 and the patches are for Solaris 9.
Does anyone have suggestion?
$ ./netca
ld.so.1: /usr/ora/920/oracle.swd.jre/bin/sparc/native_threads/jre: fatal: relocation error: file /usr/ora/920/oracle.swd.jre/bin/../lib/sparc/native_threads/libawt.so: symbol XShmQueryExtension: referenced symbol not found (/usr/ora/920/oracle.swd.jre/bin/../lib/sparc/native_threads/libawt.so)
ld.so.1: /usr/ora/920/oracle.swd.jre/bin/sparc/native_threads/jre: fatal: relocation error: file /usr/ora/920/oracle.swd.jre/bin/../lib/sparc/native_threads/libawt.so: symbol XShmQueryExtension: referenced symbol not found (/usr/ora/920/oracle.swd.jre/bin/../lib/sparc/native_threads/libawt.so)
java.lang.NullPointerException
at oracle.ewt.lwAWT.BufferedApplet.<init>(Compiled Code)
at oracle.net.ca.NetCA.<init>(Compiled Code)
at oracle.net.ca.NetCA.main(Compiled Code)

Solaris 10: Ipfilter

I am experiencing a wierd problem with ipfilter on my solaris 10 box. It is configured to allow all traffic out. However, after 1-14 days it blocks all outgoing TCP. ICMP works just fine as I am able to ping it, but logging on with SSH or using HTTP simply does not work.
Each time this happens I have to restart ipfilter.
My configuration of ipfilter is as following:
block in on bge0
pass out quick all keep state
pass in quick on bge0 proto tcp from any to any port = 22 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 80 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 443 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 3690 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 8080 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 8081 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 8090 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 1099 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 8999 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 9999 flags S keep state
pass in quick on bge0 proto tcp from any to any port = 10099 flags S keep state
pass in quick on bge0 proto udp from any to any port = 4114 keep state
pass in quick on bge0 proto icmp from any to any keep state
I am open to any suggestions as to what can be wrong...

The problem was caused by the ipfilter behavior - ignorance of interface alias. My ipnat rule was:
map aggr150031:1 ...
I have changed into:
map aggr150031 ...
and the things began to work.
Sorry for the noise.

Icmp poller on solaris 10: lot's of unexplained ping failed

Similar Messages

Maybe you are looking for