Node shutdown or crash

When a primary node shutdown or crash happens, somtimes the node is not getting shutdown properly because some cluster processes like "/etc/rc0.d/K05initrgm stop" are continously running . Meanwhile
The application is getting failovered to seondary node , but it is going to intermediate stage "Pending Online".
And both are hung.Below is the status
cldevicegroup status:
=== Cluster Device Groups ===
--- Device Group Status ---
Device Group Name Primary Secondary Status
devset - - Offline
What might cause above problem like some cluster processes not coming out ...

You are not saying which specific Sun Cluster version you are using. Since you refer to /etc/rc0.d/K05initrgm I assume it is Sun Cluster 3.1. The question is, which update and patch level?
You should make sure to have the most recent Patches for this release applied. There are known and fixed problems around this script being hung - but they are fixed for quite some time in the corresponding Patches.
If you have all recent Patches applied, I suggest to open a service call to further analyze the problem.
Greets
Thorsten

Similar Messages

How to force write-behind store on cache node shutdown?

Hi,
I built a small pilot project based on Coherence and now I test it for failover. I found replication issues with Distributed cache in the following scenario:
- start cache node 1 (JVM instance 1);
- connect Extend client to it and get 1 object from cache (only 1 object in the cache - loaded by CacheStore from DB);
- change the object and put it back (I use EntryProcessor for this);
- start cache node 2 (JVM instance 2);
- stop cache instance 1 (write-behind store wasn't invoked yet: write-delay = 2m);
- load/change the same object on node 2; all changes done on node 1 are lost.
My expectation was that cache will replicate its data between nodes when new member joins cache cluster. The backup count = 1 by default, right?
What should I do in order to prevent such behavior? Is it possible to force write-behind store on cache node shutdown event?
Thanks, Denis.
My cache-config, just in case:
<cache-config>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>AccountCache</cache-name>
<scheme-name>account-distributed</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<distributed-scheme>
<scheme-name>account-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<serializer>
<class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
<init-params>
<init-param>
<param-type>String</param-type>
<param-value>account-pof-config.xml</param-value>
</init-param>
</init-params>
</serializer>
<backing-map-scheme>
<read-write-backing-map-scheme>
<scheme-name>AccountDatabaseScheme</scheme-name>
<internal-cache-scheme>
<local-scheme>

<eviction-policy>LRU</eviction-policy>
<high-units>0</high-units>
<expiry-delay>30m</expiry-delay>
</local-scheme>
</internal-cache-scheme>
<cachestore-scheme>
<class-scheme>
<class-name>com.roox.bss.cache.store.AccountCacheStore</class-name>
<init-params>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>dburl_</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>user</param-value>
</init-param>
<init-param>
<param-type>java.lang.String</param-type>
<param-value>password</param-value>
</init-param>
</init-params>
</class-scheme>
</cachestore-scheme>
<write-delay>2m</write-delay>
<write-batch-factor>.5</write-batch-factor>
</read-write-backing-map-scheme>
</backing-map-scheme>
</distributed-scheme>
<proxy-scheme>
<service-name>ExtendTcpProxyService</service-name>
<thread-count>10</thread-count>
<acceptor-config>
<tcp-acceptor>
<local-address>
<address>localhost</address>
<port>9098</port>
<reuse-address>true</reuse-address>
<reusable>true</reusable>
</local-address>
</tcp-acceptor>
<serializer>
<class-name>com.tangosol.io.pof.ConfigurablePofContext</class-name>
<init-params>
<init-param>
<param-type>String</param-type>
<param-value>account-pof-config.xml</param-value>
</init-param>
</init-params>
</serializer>
</acceptor-config>
<autostart>true</autostart>
</proxy-scheme>
</caching-schemes>
</cache-config>

solved with autostart=true

Autotune Plug-in Causes Node App to Crash

Has anyone noticed that you can't use the logic autotune plug-in while running distributed audio? The node application on the node machine crashes if you make any adjustments to the parameters in the autotune plug-in even though you may only be running the track that uses the plug-in on the host machine . . . NOT on the node machine. It's very strange. Moreover don't even think about running a track that uses autotune on the node machine because it's a sonic disaster. So basically you can't run distributed audio if you are using this plug-in. The apple tech guy was able to reproduce the issue on the phone with me but they have no solution for the bug as of yet. I am wondering if anyone else has experienced this and/or has a work-around.

Hello,
Welcome to Adobe Forums.
To disable the Contribute plugin from Microsoft Outlook 2013 you need to click on File menu then choose options from it.
FIle>Options>Add-Ins>
Then Click on Go button from Manage com Add-ins .
Highlight the Contribute plugin from the list and remove it .
Please see the screenshot as mentioned below.
Regards,
Rajeev.

Using code interface node with dll crashes LV 2011 but not LV 8.6... using max error handling does nothing

Hi all,
I'm having a peculiar problem.
I inherited a project saved in LabVIEW 8.6. The project must use of particular dll that was built a few years ago. The original developer and source code for this .dll are long gone. The very core logic exists, in the form of embedded C code, and that's it. The .dll is called through a Code Interface Node in LV 8.6 and this setup manages to "work". Howver, running the VI that calls this .dll in LV 2011 causes an "insta crash", as in, no "program has stopped responding". Error message pops up, then all LV windows close.
It's very similar to that described in here:
http://digital.ni.com/public.nsf/allkb/D84C9775ABD921CF8625772A005CA50C
but this KB says to try putting the amount of error handling to maximum. I tried that, but it didn't help.
Using the "Debug" option allows me to run the just-in-time debugger with CVI 2010, which then proceeds to say this:
Finally, I manage to get this out of it:
FATAL RUN-TIME ERROR: Unknown source position, thread id 0x000012D4: A non-debuggable thread caused a 'General Protection' fault at address 0x00000000.
I don't think that really helps at all, but it's there.
Here is the function prototype of the .dll:
void _inputPM@224(uint8_t arg1, uint16_t arg2, uint8_t arg3, float arg4, float arg5, float arg6, float arg7, float arg8, float arg9, float arg10, float arg11, float arg12, float arg13, float arg14, float arg15, float arg16, float arg17, uint16_t arg18, uint16_t arg19, uint16_t arg20, uint16_t arg21, uint16_t arg22, uint16_t arg23, uint16_t arg24, uint16_t arg25, uint16_t arg26, float arg27, float arg28, float arg29, float arg30, float arg31, float arg32, float arg33, float arg34, float arg35, float arg36, float arg37, float arg38, float arg39, float arg40, float arg41, float arg42, float arg43, float arg44, float arg45, float arg46, float arg47, uint16_t arg48, uint16_t arg49, uint16_t arg50, uint16_t arg51, uint16_t arg52, float arg53, float arg54, float arg55, float arg56);
(never seen a function take 50 input params like that (wouldn't you use a struct? array? something? But I digress, and I don't know anything about .dlls...))
I do have a ".lib" and a ".cdb" file with the same name as the .dll, but those also looks like some kind of compiled file.
I'm sure the answer I'm going to get is that there is no way of telling what's really going on without .dll source code. I'm hoping against hope that there may be another way or hack.
Any ideas? Thank you for you help.
Regards,
Mark G
Solved!
Go to Solution.

MarkCG wrote:
Changing the call library node to stdcall (WINAPI) did the trick! No more crash. Thank you very much!
I haven't run LV2011 on windows XP, only on windows 7, so I don't know if that makes a dfference. But The call type makes no difference when using LV 8.6 on the same machine, however.
Now I've got to make the .DLL run corretly on a compact fieldpoint.... and avoid crashing IT.
Thank you for the help all!
Well, the DLL did work fine in LabVIEW 8.6 because earlier versions of LabVIEW automatically proceeded to change the calling convention to stdcall, if they noticed a function name decoration of @# (# = number of bytes passed on the stack) appended to the name. This is the default naming decoration for stdcall functions used by VisualC. However this decoration can be overwritten with linker switches, other compilers don't use them always in the same way, and most likely there are in the mean time compilers out there that can produce such decorations also for non stdcall calling convention. So this automagic trickery was removed from newer LabVIEW versions.
I do think it could be considered a bug in the code that upgrades LabVIEW VIs, that it uses the calling convention that is configuerd in the dialog, instead of the calling convention earlier LabVIEW versions used automagically, but it is an esoteric corner case.
What Compact Fieldpoint controller do you have? If it's anything newer than 20xx or 21xx forget it. The 22xx controllers use a PPC CPU and VxWorks operating system and can never get a Windows DLL to operate properly. If it is a 20xx controller it's still highly likely that DLL can not even get loaded into LabVIEW as it likely relies on other runtime libraries and possibly Win32 APIs not present in the Pharlap ETS runtime kernel used on those controllers.
There is a tool to check a DLL for incompatible imports that are not present on specific ETS RT systems. And this list summarizes the RT system used on the various NI controllers.
Rolf Kalbermatter
CIT Engineering Netherlands
a division of Test & Measurement Solutions

Asm and node shutdown

Hi,
Due to maintenance of our IT infrastructure we need to perform a clean shutdown one of our 2-node RAC.
This maintenance during about one week.
What is the correct procedure to shutdown node,asm e diskgroup of this node?
After the startup the diskgroup rebalanced automatlically?
thanks

You can refer below link as well
Oracle DBA and RAC DBA Expert: How to STOP and START processes in Oracle RAC and Log Directory Structure
Regards,
http://www.oracleracexpert.com
Thread 1 cannot allocate new log & Checkpoint not complete
Move Datafile between DikGroups in ASM

One node RAC pause/hang/block on other node shutdown

Hi,
We have a Java application running on Linux servers connecting to a 10.2.0.1 RAC cluster, also Linux. When the application starts it opens up a pool of connections to the databsae, and these are used throughout the life time of the application. One server connects to one RAC node.
AppA - DBA
AppB - DBB
When we shutdown one node, the application connecting to that node stops, which is what we would expect in this configuration.
What is strange is that the other application blocks for 63 seconds and then continues. So it is like the database is blocking, or the database connections are blocking.
We are not using TAF, FAN, FCN, LB, VIPs or any special features, just simple lightweight JDBC from one server to one database. In fact I do not thing we are unwittingly using any of these features, we have them switched off.
john

user1788323 wrote:
What is strange is that the other application blocks for 63 seconds and then continues. So it is like the database is blocking, or the database connections are blocking.How have you determined/diagnosed the 63s blocking? (more details in this regard may shed some light on the problem)
Assuming that the block is server side, then two basic reasons comes to mind.
Networking issue - the CRS on the surviving node has to perform certain functions, like switching the VIP of the node that left the cluster to a surviving cluster node. The listener may need to re-register services. A local firewall may need to be dynamically reconfigured for supporting the new failed-over VIP. Etc.
Thus these could result in some kind of delay or issue in the network layer that you are seeing from the client side.
Infrastructure issue. If the actual client request via JDBC reaches the server process, and it is slow in responding, then that is not a network issue - instead some underlying service or s/w layer that the server process needs to use to perform the client request is busy for those 63s.
This could be related to the Interconnect, the shared I/O storage layer or something along those lines. For example, how does the Interconnect and/or SAN switch re-act when a server node is powered down or rebooted?
There's not really sufficient information to make anything but a guesses.. You will need to isolate the problem with further testing.
I have seen similar problems with 10.1.0.3 CRS and RAC when a node is evicted from the cluster. In this case the "hung" period was in excess of 15 minutes and only for new connections (Listener unable to hand off to dedicated servers or dispatchers). Existing connections worked fine however and were unaware of any problems. But part of the issue in this case was a poor (outdated) driver layer - and also the last time we used proprietary binary drivers (kernel modules) from 3rd party vendors that results in a tainted (and very fixed and rigid) Linux kernel. Today we're sticking with an OpenSource driver layer only for Linux.

Node shutdown in sun cluster

I have a two node cluster configured for High availability..
My resourcegroup is online on node1..
so the resources, logical hostname resource and my application resource are online on node1.
when the node1 is shutdown, the resource group is failovered to node2 and is online..But when the node1 is brought back, the logical hostname is plumbed up on node1 also.. So both nodes have the logical hostname plumbed up..(from ifconfig -a output)
which is causing the problem.
My question is "Does sun cluster check the status of resources in the resource group on the node where my resource group is offline" . If it does, what additional configuration is required.

This is a pretty old post and you probably have the answer by now )or have abandoned all hope), but it seems to me that what you want is to reset the resource/resource group dependencies for node1.
If node1 is coming online under logicalhostname without all the resources coming up, you just don't have the resource dependency set up. You can do this in the SunPlex Manager GUI pretty easily. That should make it so the node doesn't get added to the logicalhostname resource group until X dependencies are met (what X stands for is entirely up to you; I didn't see the resource you want to come up first listed.)

TAF Failover issue when RAC node shutdown

Dear all,
We have a two-node RAC database. We use sqlplus from a client laptop to test RAC TAF failover when one node is being shutdown. And there's a tnsnames.ora file with TAF settings in the client laptop.
First we connect to RAC database via sqlplus, when we are under the "SQL>" command prompt, we type " select instance_name from v$instance; " and we can see what instance we truely connect to. Then we shutdown the node we truely connect to; At the meanwhile, if we type "select instance_name from v$instance;" again right away, sometimes the sqlplus hangs and with no response; but if we wait utill the VIP failover to another node then type "select instance_name from v$instance;" we can see it always show the other node's instance name and we know the session is successfully failover to the healthy node.
My question is :
Does RAC TAF failover can always and "no down time" failover the session to another healthy node? Or there are some circumstances that the session would hang and need to connect again?
Any help would be appreciated.

Hi, thanks for your help.
There are many things you have to do but if you don't have the knowledge will be difficult.Right. The cluster was setup by consultants but we're still trying to pick up basic Oracle knowledge by self study...
Found some messages about eviction in old cssd logs in $ORA_CRS_HOME/log/cssd/. Will further dig into it.
Yes, we tried rebooting different nodes many times in the clusters before, without any problem.
Thanks a lot.
/ST Wong

Sever node shutdown/restart

Having four dialog instances and each with 3 server nodes ( server0,server1 & server2) respectiviely. Is it possible to shutdown/restart the server nodes individually of a dialog instance.if so, will it bring down the system while stopping or restarting.
For instance If I stop server2 ..then server0 & server1 should be up
Please lend your hand for help and guidance with procedure

Hi,
Go to /sapmnt/<SID>/profile directory.
Then run jcmon pf=<instance profile>. Then it will show you few options then select 20. Then on next screen it will show you a number of menu based options. Then is one option to Restart the process. Select it and then enter index number of server node you want to restart then it will restart it.
Thanks
Sunny

Shutdown and crash issues on MDD

I am continuing to investigate a problem where my MDD crashes - (and maybe it's called a hang?). The program I'm working with stops functioning and I end up with a spinning beachball that cannot be force quit - making powering down with the power button the only option. This crash also seems to happen if more than a few windows are open - it just seems arbitrary, and happens often enough that I can't pinpoint a problem. I should say that I have two other MDDs that work fine. This one I bought 2nd hand because all it needed was a PSU, which I replaced. I also added a new startup drive (installing OS 10.4.11) and 2 two-TB SATA drives with a Firmtek/Seritek 1S2 controller card.
In response to my earlier post on this, BD Aqua suggested I try looking at the Console to see if I could find any related information or reports there (I'm looking in the system.log). When the problem happened again, as it invariably does, I rebooted, looked in Console and found: MDNSResponder: NOTE Wide-Area Sevice Discovery disabled to avoid crashing defective DNS relay 192....followed by 5 more digits... And I also see: exited abnormally: Hangup And: Startup items failed to properly start.
I did a brief search through the forums and came up with a possibly similar message query in a post - answered in fact by BD Aqua.
I'm wondering if this solution could solve my problem as it seems to have done for the party who asked the question, below (although they have a different model and also that the issue didn't cause him a problem, as it has me:
defective DNS relay?
Posted: Feb 17, 2009 2:18 PM
mDNSResponder: NOTE: Wide-Area Service Discovery disabled to avoid crashing defective DNS relay
This has appeared in the system profiler system log for quite a while now - noticed it a few months ago - as far as I can tell it's not causing trouble. There's an address at the end after the word "relay" xxx.xxx.x.x (with actual numbers of course).
Can anyone enlighten me as to what it means?
Thanks!
G5 Powermac dual 2.5G 4G ram 160G and 300G hds
BDAqua
Posts: 53,635
Registered: May 14, 2006
Re: defective DNS relay?
Posted: Feb 17, 2009 2:44 PM in response to: estabroo
Hi estabroo,
Quit any Browser or Internet APP like Mail, etc.
Then in Network>TCP/IP>DNS Servers:, for that Interface, paste these 3 numbers in...
208.67.222.222
208.67.220.220
127.0.0.1
Use Terminal & Terminal command to Flush DNS Cache Tiger to 10.5.1...
lookupd -flushcache
ENTER
Reboot.
Anyway, that's an excerpt from what I found. The writer said it seemed to make the sys. profiler msg. go away...but in my case, the computer stops responding.
I have also tried different mice in different USB ports - same problem

I wanted to know if that was something to try...
I'd try it...
Terminal commands to Flush DNS Cache Tiger to 10.5.1...
lookupd -flushcache
Leopard 10.5.2 or greater...
dscacheutil -flushcache
Can it be a hardware issue?
Easily.
or is there anything else I could try?
Did we see if it shuts down when running in Safe Mode? Safe Boot from the HD, (holding Shift key down at bootup).
Did you try Applejack...
http://www.versiontracker.com/dyn/moreinfo/macosx/19596
After installing, reboot holding down CMD+s, (+s), then when the DOS like prompt shows, type in...
applejack AUTO
Then let it do all 5 of it's things.
At least it'll eliminate some questions if it doesn't fix it.
The 6 things it does are...
Correct any Disk problems.
Repair Permissions.
Clear out Cache Files.
Repair/check several plist files.
Dump the VM files for a fresh start.
Trash old Log files.
First reboot will be slower, sometimes 2 or 3 restarts will be required for full benefit... my guess is files relying upon other files relying upon other files!
Disconnect the USB cable from any UPS so the system doesn't shut down in the middle of the process.

(Running Windows 7 Home Premium) When Firefox 4.0 hangs (forcing a shutdown) or crashes (the screen has even gone to black) my windows fail to restore most of the time.

On the occasions when the windows fail to restore, I usually see a blank start page with something to the effect of "about:restore" or "about:restore session" in the URL line. I've tried pressing enter, but it didn't help. Other times, I've immediately tried going to the History menu, but the "Restore Previous Session" menu pick is faded out. I followed the help posted here to activate the restore windows feature, but the windows still fail to restore most of the time. I'm not a techie, so I'd really appreciate some easy to understand help.

The DMP file is inconclusive. You could run driver verifier to find the underlying issue but WPR is often easier.
In order to diagnose your problem you will need to download and install the below
Install the WPT (windows Performance Toolkit)
http://www.microsoft.com/en-us/download/details.aspx?id=30652
Help with installation (if needed) is here
When you have, open an elevated command prompt and type the following
WPRUI.exe (which is the windows performance recorder) and check off the boxes for the following:
First level triage, CPU usage, Disk IO.
If you problem is not CPU or HD then check off the relevant box/s as well (for example networking or registry) Please configure your as per the below snip
Click Start
It will reboot 3 t imes and record the data to a file and tell you its name and location.
Zip the file and upload to us on Onedrive (or any file sharing service) and give us a link to it in your next post.
These crashes were related to memory corruption (probably caused by a driver).
Please run these tests to verify your memory and find which driver is causing the problem.
If you are overclocking (pushing the components beyond their design) you should revert to default at least until the crashing is solved. If you don't
know what it is you probably are not overclocking.
Since it is more likely to be a driver please run verifier first.
1-Driver verifier (for complete directions see our wiki here)
If verifier does not find the issue we can move on to this.
2-Memtest. (You can read more about running memtest here)
Co-Authored by JMH3143
Wanikiya and Dyami--Team Zigzag

Stopping RAC instances on shutdown (10g - RHEL3)

Any advise on the correct way to script instance stop in cluster nodes?
As soon as a node comes up, CRS automatically starts its instances. However, according to alert logs, they are not being cleanly stopped (if left as configured by the installation procedure):
Errors in file /opt/oracle/admin/oragis/bdump/oragi1_pmon_4778.trc:
ORA-00481: LMON process terminated with error
PMON: terminating instance due to error 481
Beginning crash recovery of 1 threads
In other words, on node shutdown, we need to issue
srvctl stop instance -d ... -i ...
What's the proper place to issue this command before the node shuts down?
Thanks,
Ivan

Any advise on the correct way to script instance stop in cluster nodes?
As soon as a node comes up, CRS automatically starts its instances. However, according to alert logs, they are not being cleanly stopped (if left as configured by the installation procedure):
Errors in file /opt/oracle/admin/oragis/bdump/oragi1_pmon_4778.trc:
ORA-00481: LMON process terminated with error
PMON: terminating instance due to error 481
Beginning crash recovery of 1 threads
In other words, on node shutdown, we need to issue
srvctl stop instance -d ... -i ...
What's the proper place to issue this command before the node shuts down?
Thanks,
Ivan

Should I upgrade to a MBP? and what about these crashes?

Right, so I bought a macbook about a week ago, and I'm just not sure if I should upgrade to a MB Pro or not?
I'm not a huge gamer, the only game I can think of playing on XP is Flight Simulator X, which does need a good video card, but I haven't played any games for 6 months or so, so I don't think that justifies getting a MBP.
Other than that, how is XP performance on the Core 2 Duo macbook? Does the integrated graphics cause problems? I suppose it is just like any XP laptop you can buy nowadays, as most of them also come with integrated graphics.
The only thing that does make me feel like getting a MBP, however, is that this thing is a bit too happy to ramp up the fans sometimes. I'm thinking this is because of the integrated graphics? Am I wrong about this? I noticed that the fan came on easily in the following scenarios:
1. Playing Macbreak Weekly H.264 video podcast fullscreen within iTunes 7. If not fullscreen, the fan doesn't come on. BTW, I noticed VLC plays this a lot smoother than iTunes. I don't know if VLC makes the fans come on though.
2. Running iEatBrainz (a rosetta version). the fan came on while it was scanning my music collection, maybe this isn't related to integrated graphics?
3. Playing an HD trailer, the fans come on. If this is due to integrated graphics then isn't this a future problem when there will be more HD around to play? the fan coming on all the time will be very annoying. I'm not really going to be playing heavy HD anytime soon though, but in the future it might be an issue.
Two other separate problems I hope you guys can help me with:
Problem 1:
It has crashed 3 times in the past week.
Crash 1: I think this was due to VirtueDesktops and trying to use an external display at the same time. Basically the WindowServer crashed and I lost the Apple menu and right-side clock and stuff. The dock was there and I could click on icons but nothing would launch. I had to force a shutdown by holding down the power button.
Crash 2: Once again I think this is due to the external display. I was connected to the external display when I put the computer to sleep and it hung, the display was still on the external monitor but I couldn't click on any menus or dock so I had to force shutdown.
Crash 3: Just 2 days ago I disconnected from the external display and was using it on the couch when I closed the lid because I had something to do and as i walked away I heard the startup apple sound! It had restarted just because I closed the lid! That was really odd.
Before I close this problem off, though I must add that I initially used this laptop w/o an external display and keyboard+mouse for 3 days straight without any problems. I also did a hardware test (extended) and everything is fine. So I think 10.4.8 and external display + USB might be broken?
Just in case, I have reinstalled the whole OS a day ago and so far with multiple sleep cycles and doing normal work (not connected to external stuff) it is just fine so I don't think there is a hardware issue.
Problem 2:
The other problem with the Macbook is that the button on the trackpad is just slightly sort of off-balance? I can make it move forward/backward vertically if I trace my finger vertically across it, is this normal? Sometimes if I click on the left hand side, it sort of gets stuck but immediately comes back up, so it's not a huge problem but annoying nonetheless. Does this warrant me getting a replacement unit? Could the button just stop working in the future, and then they have to replace the whole top casing don't they? Note that this happens only sometimes, and is hard to reproduce if I try to do it manually, it just happens *once in a while* but the button does work properly and everything and the right side of the trackpad is just fine.
Other than this the laptop is fine, and I simply don't have the budget to pay for a MBP right now but I've been waiting to get a mac for so long now that I just don't want to return this and wait a few more months ...
White 2.0 Ghz Macbook Core 2 Duo Mac OS X (10.4.8)

you are right, the macbook has endured multiple sleep/wake cycles now, and i finally plugged in a USB mouse yesterday and it still seems fine. The touchpad button is still a little loose however.
As for the extended test, I did that and there are no problems, so what do I do now?
I have decided against upgrading for now, and I don't want to go to the hassle of taking it all the way down to the apple store, but even more stories today of macbook cases slightly warped after time and cracks appearing also scare me, how common is this? Or is it just careless users who might have dropped the macbook?
If this case bending or warping or cracking and ramping up fans is common, I might as well save all my money and return it. I don't want to argue with Apple later on about fixing things, I've read horror stories there as well ...
and yes, I'm in a forum where people only post problems but there seems to be too many, or am I wrong?
White 2.0 Ghz Macbook Core 2 Duo Mac OS X (10.4.8)

RAC Linux DB Servers Shutdown When Patching from 11.1.0.6 to 11.1.0.7

Hello.
We have been successfully running a RAC 11.1.0.6.0 64-bit DB on Linux Redhat DB servers. We have had this DB running RHEL version 4 at kernel version 2.6.9-78.0.1.ELsmp. We also had successfully installed the January 2009 family CPU patch. A few months ago we began building a new 64-bit 11.1.0.6.0 RAC cluster set. This new cluster was on a different OS (a patch level up - kernel version is 2.6.9.78.ELsmp (Update 7). To our knowledge, no other OS/HW differences existed between the older, first RAC cluster and the new (2nd) one.
After the SAs handed over the new RAC DB servers to us, we installed the 11.1.0.6.0 clusterware without problem. We then began to install the clusterware 11.1.0.7.0 patch. Here is the problem. When we patch the clusterware up to 11.1.0.7.0, the last thing we are supposed to do is run the root shell script. When we run this script, it starts and never retunrs. In fact, by initiating this script, the node on which it is running actually crashes/shuts down. It does not reboot. It shuts down and we actually have to literlaly go to that server and push the on-off button. We can repeat this on any of the nodes and all of the machines respond the same. We have repeated this cluster build three different times. The first time, the root script finished, but each of the nodes later began shutting down selectively afterwards and we'd have to restart them. On the 2nd and 3rd attempts, the root script would not finish and the node would crash.
Oracle support hinted that we needed to go to the install glibc-2.3.4-2.40 (or above) or upgrade to EL4u7. Well, we are at EL4u7 now and the version of glibc is glibc-2.3.4-2.41. We look fine on that front. They also indicated that we need to install both the 32-bit and 64-bit versions of glibc. When we run "rpm -qa | grep -i glibc", we see:
glibc-2.3.4-2.41:
glibc-devel-2.3.4-2.41
glibc-devel-2.3.4-2.41
glibc-2.3.4-2.41:
We wonder as to why we see TWO (2) entries repeated. Does anyone know why? Also, how can one tell if both the 32 and 64-bit versions of glibc exist on our 64-bit machines? Does the fact that the above duplicate entries imply this?
MOE IMPORTANTLY, has anyone experienced this node/DB server crash when patching up from 11.1.0.6.0 to 11.1.0.7.0? We are out of answers on or end. We cannot even look at the system logs or the ocssd logs because the servers crash immediately upon restarting them.
Thank you.
Matt
Matt

Matt,
When I had this issue and the first node was going in to reboot loop, I remember I saw a message related to 'tick' of some kind. Based on that I thought the cluster is rebooting because of quorum issue with vote disk. So I thought if clusterware don't find vote disk then it will not start and no possibility of quorum issue that was leading to reboot. So I renamed the vote disk and OCR (which was not necessary though) and uninstalled and installed clusterware again.
In /etc/inittab OCCSD daemon is started at run level 3,5 with fatal option. That means if Cluster Synchronization Service (OCSSD daemon) gets terminated it reboot the server. If I had option of starting server to init 1 level, I would open /etc/inittab and comment out following lines and then start server in init 3 level and uninstall clusterware and install again.
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
Edited by: jatin1 on Apr 27, 2009 11:30 PM

Node.js loss of permission to write/create log files

We have been operating Node.js as a worker role cloud service. To track server activity, we write log files (via log4js) to C:\logs
Originally the logging was configured with size-based roll-over. e.g. new file every 20MB. I noticed on some servers the sequencing was uneven
socket.log <-- current active file
socket.log.1
socket.log.3
socket.log.5
socket.log.7
it should be
socket.log.1
socket.log.2
socket.log.3
socket.log.4
Whenever there is uneven sequence, i realise the beginning of each file revealed the Node process was restarted. From Windows Azure event log, it further indicated worker role hosting mechanism found node.exe to have terminated abruptly.
With no other information to clue what is exactly happening, I thought there was some fault with log4js roll over implementation (updating to latest versions did not help). Subsequently switched to date-based roll-over mode; saw that roll-over happened every
midnight and was happy with it.
However some weeks later I realise the roll-over was (not always, but pretty predictably) only happening every alternate midnight.
socket.log-2014-06-05
socket.log-2014-06-07
socket.log-2014-06-09
And each file again revealed that midnight the roll-over did not happen, node.exe was crashing again. Additional logging on uncaughtException and exit happens showed nothing; which seems to suggest node.exe was killed by external influence (e.g. process
kill) but it was unfathomable anything in the OS would want to kill node.exe.
Additionally, having two instances in the cloud service, we observe the crashing of both node.exe within minutes of each other. Always. However if we had two server instances brought up on different days, then the "schedule" for crashing would
be offset by the difference of the instance launch dates.
Unable to trap more details what's going on, we tried a different logging library - winston. winston has the additional feature of logging uncaughtExceptions so it was not necessary to manually log that. Since winston does not have date-based roll-over it
went back to size-based roll-over; which obviously meant no more midnight crash.
Eventually, I spotted some random midday crash today. It did not coincide with size-based rollover event, but winston was able to log an interesting uncaughtException.
"date": "Wed Jun 18 2014 06:26:12 GMT+0000 (Coordinated Universal Time)",
"process": {
"pid": 476,
"uid": null,
"gid": null,
"cwd": "E:
approot",
"execPath": "E:\\approot
node.exe",
"version": "v0.8.26",
"argv": ["E:\\approot\\node.exe", "E:\\approot\\server.js"],
"memoryUsage":
{ "rss": 80433152, "heapTotal": 37682920, "heapUsed": 31468888 }
"os":
{ "loadavg": [0, 0, 0], "uptime": 163780.9854492 }
"trace": [],
"stack": ["Error: EPERM, open 'c:\\logs\\socket1.log'"],
"level": "error",
"message": "uncaughtException: EPERM, open 'c:\\logs\\socket1.log'",
"timestamp": "2014-06-18T06:26:12.572Z"
Interesting question: the Node process _was_ writing to socket1.log all along; why would there be a sudden EPERM error?
On restart it could resume writing to the same log file. Or in previous cases it would seem like the lack of permission to create a new log file.
Any clues on what could possibly cause this? On a "scheduled" basis per server? Given that it happens so frequently and in sync with sister instances in the cloud service, something is happening in the back scenes which I cannot put a finger to.
thanks
The melody of logic will always play out the truth. ~ Narumi Ayumu, Spiral

Hi,
It is strange. From your description, how many instances of your worker role? Do you store the log file on your VM local disk? To avoid this question, the best choice is you could store your log file into azure storage blob . If you do this, all log
file will be stored on blob storage. About how to use azure blob storage, please see this docs:
http://azure.microsoft.com/en-us/documentation/articles/storage-introduction/
Please try it.
If I misunderstood, please let me know.
Regards,
Will
We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
Click
HERE to participate the survey.

Node shutdown or crash

Similar Messages

Maybe you are looking for