Abort rate at Send2VRU node

All,
Can anyone provide some expert feedback or documented acceptable levels of abort rates at the send2vru node? I have heard from many field engineers that there is an expected/typical level of aborts at this node when establishing the VRU leg but of course nothing documented and only experience saying it should be just a fraction of a percent..
Per my understanding the following interaction occurs when establishing a VRU leg..
1. Call arrives on the gateway --> CUSP/CUPS --> CallServer(CVP) --> signaling to ICM via PG Request Instruction..
2. PG sends it up to the Call Router and based on Dialed Number - Call Type - Scheduled Script, starts the script and hits the Send To VRU node
3. Network VRU label comes back and the signalling goes to the gateway, hits the VRU leg dial peer and starts the bootstrap service (TCL)
4. This calls the bootstrap VXML which pings the call server, to which it replies with via vxml over http and the gateway then submits a fairly large vxml document with call data to call server.
5. Now the VRU leg is established and, from the Call Router's point of view, the call exits the check port and moves on.
The fact that these are aborts leads me to believe these are in fact network related issues at some point between 3 and 5.
Is it theoretically possible to eliminate every one of these? My current environment takes ~150k calls a day and sits at ~.5% abort rate on this node. I'm unclear whether or not this is an issue to be pursued or if this is within typical limits?
Any input/experiences would be much appreciated

ok..thanks ..so how should i install that h/w drivers on other pc??.. would NI device drivers CD do get installed on other pc if there is no labview installed there??.....or otherwise will i have to load it from internet?..when i install labview run time engine on other pc doesn't it get installed with all the h/w drivers?
my 2nd query is-
we here at National Chemical Laboratoty,Pune have both labview 7.0 and labview 7.1 license copies.but there is no option "build application or shared libraries" in Tools menu.i have been trying installing the CD again and again to get this option but nothing happened..what could be the possible reason?
best regards.

Similar Messages

Cancel or abort invoke node that hangs up

Hello,
i use an invoke node to grab an image from firewire camera. camera in trigger mode.
application hangs up when trigger signal fails. running VI in highlight execution mode shows that "IICImangingControl, MemorySnapImage" (upside left) hangs up.
VI cannot be cancel with abort execution button.
target: abort invoke node in case of timeout (error handling)
question: how can i abort hanged up invoke node?
timon
Attachments:
GrabPicture 02.tg.vi ‏97 KB

The camera code is an ActiveX/COM DLL that you are calling out to. Unfortunately, when a thread leaves LabVIEW and calls external code, we have no control over it anymore. When you abort the VI, we'll stop the code execution, but we have to wait for the COM DLL to release the thread.
Being able to do this is outside our control, unfortunately. You should look at the camera API, or talk to the vendor, about a method that either has a timeout or is asynchronous in nature.
Brian Tyler
http://detritus.blogs.com/lycangeek

RAT replay issue on 2 node RAC

Hi Guys,
I am working on RAT with 2 node RAC (dbname: SERVER1).
I have started workload capture on SERVER1, directory name: CAPTURE and executed some simple update script like
SQL> update employee set empname='A' where empid=1;
SQL> commit;
Later, created another directory named REPLAY on same 2 node RAC (dbname: SERVER1), and moved all captured files from both CAPTURE directory (on both nodes) to both REPLAY directory (on both nodes).
Now before starting the workload, made the following changes to employee table:
SQL> update employee set empname='Z' where empid=1;
SQL> commit;
Then started the replay process, which got completed successfully in few mins.
But now, when i see employee table's record with empid 1, there in not change to empname field. It's still showing as 'Z', also the AWR and cursor cache doesn't show execution of this UPDATE statement.
Can someone suggest what's the issue going here?
Edited by: 851602 on Jun 23, 2011 9:54 AM

There is no such thing as 'Urgent help' in a forum of volunteers.
Your usage of 'Urgent' is insulting and rude.
Also databases don't have fields. Are you sure you aren't working with a punch card system?
Sybrand Bakker
Senior Oracle DBA

Crs doesn't start on second node

Guys,
RAC on 2 nodes
Release 10.2.0.5.0
Solaris 10
There was a problem with the cable that enables connection for the interconnect, but the problem has been solved. One of the nodes was evicted and all resources were move to the other node. Once the problem was solved I tried to start the cluster that was evicted but to no success. when I run crs_stat -t I get the infamous CRS-0184.
I have checked the ocr and olsnodes; ocr seems to be fine and the second node is recognized as part of the cluster.
cluvfy comp ocr -n lenin,trotsky -verbose
Verifying OCR integrity
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Verification of OCR integrity was successful.
oracle@trotsky > cluvfy comp nodereach -n lenin,trotsky -srcnode trotsky -verbose
Verifying node reachability
Checking node reachability...
Check: Node reachability from node "trotsky"
Destination Node Reachable?
lenin yes
trotsky yes
Result: Node reachability check passed from node "trotsky".
I have checked /var/adm/messages and crs and cssd log but I didn't see anything that stands out...
I have also tried to delete the content of /var/tmp/.oracle and restart crs but again to no success.
I have read in another thread in this forum that crs problems are either related to the interconnect or ocr/voting disks but as mentioned before they seem to be OK.
I'm running out of ideas, any suggestions?
One of the nodes now holds both vip addresses:
bge0:1: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 192.168.191.184 netmask ffffff00 broadcast 192.168.191.255
bge0:2: flags=1040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4> mtu 1500 index 2
inet 192.168.191.182 netmask ffffff00 broadcast 192.168.191.255
Do I need to manually reconfigure the interface do that is then held by the second node?
Thanks in advance for your help

Cheers for your input!
The results on the suggested cluvfy command is: passed on all checks with the exception of the daemon liveness (as expected).
Excerpts from the different logs:
alert.log
2010-11-19 13:12:35.033
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s5. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
2010-11-19 13:12:35.050
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s4. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
2010-11-19 13:12:35.062
[cssd(4928)]CRS-1605:CSSD voting file is online: /dev/rdsk/c1t500601604BA03AEAd0s6. Details in /u01/crs/10.2.0/crs_1/log/trotsky/cssd/ocssd.log.
cssd.log
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmRcfgMgrThread: not possible to join the cluster. Please reboot the node.
[    CSSD]2010-11-19 13:16:47.059 [21] >WARNING: clssnmReconfigThread: state(1) clusterState(0) exit
I have tried rebooting the node but that did not help.
crsd.log
2010-11-19 13:53:49.652: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-19 13:53:50.889: [ COMMCRS][1802]clsc_connect: (1009ac310) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_trotsky_))
2010-11-19 13:53:50.889: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2010-11-19 13:53:50.890: [ CRSRTI][1] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-11-19 13:53:51.899: [    CRSD][1][PANIC] CRSD exiting: Could not init the CSS context
2010-11-19 13:53:51.899: [    CRSD][1] Done.
Does this help?

GSD and ONS shutting down automatically when listener is started.

Issue with RAC database Node 2
OS: Windows 2003 Server (64-bit)
Problem 1:
After patching Oracle to a higher version i.e from 10.2.0.3.0 P31 to 10.2.0.4.0 P35, the second database node was not starting up properly. When starting, the node hangs with blue screen.
Cause:
The above problem was because the second node, when starting up the cluster related services,
was not able to communicate with the first node through the cluster interconnect network (heartbeat).
The node tries to ping the heartbeat several times and the node gets evicted from the cluster resulting in BSOD.
We found that several times this type of node eviction had occurred.
Due to several evictions of Node 2, Node 1 locks Voting Disk to prevent from corruption.
NOde 2 again while starting tries to communicate to the Voting Disk to join the cluster.
But since those files are locked by Node 1, Node2 is not able to access those files and this also results in BSOD.
The above information was found out from ocssd logs.
ocssd.log
=========
WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
WARNING: clssnmRcfgMgrThread: not possible to join the cluster. Please reboot the node.Solution:
The only way to release the lock was to reboot the Node 1. We rebooted the node 1 and the lock was released.
Now both the nodes were able to communicate with the configuration files and there were no BSOD on node 2.
All the cluster related services were started without any issues and the node joined the cluster.
Problem 2:
After problem 1 got solved, we noticed one more issue with the node 2.
We noticed that in node 2, when listener is started, the GSD and ONS node applications dies.
When we stop the listener, GSD and ONS starts.
Also when I try to start the instance on Node 2, it hangs in the startup command.
Oracle is not able to start the instance. Also the configuration assistants, srvctl etc was not working.
I got information that if GSD is not started, then DBCA,srvctl and other commands do not work.
But how to resolve the issue described (highlighted) in problem 2.
Please help me....
Thanks & Regards,
Mahesh Menon,
Oracle DBA,
Key Information Technology LLC.

Hi Bibii and welcome to Discussions,
first of, you have posted in the Mac Pro section not the MacBook Pro section here http://discussions.apple.com/category.jspa?categoryID=190
Anyway, is this really a shutdown or is it just sleeping ?
If sleeping check the Energy Savings settings in the System Preferences for when your MBP should go to sleep when inactive.
Regards
Stefan

Display line items in Cells of a table- Smartform

Hello All,
Can any one help me achieving the below requirement.
I have a requirement of printing invoice with a smartform. I am displaying line items from a TABLE node by looping VBRP's internal table. There are 12 cells and the 5 th cell contains a number of Dicount rates.
I am displaying these discount rates from LOOP node by looping KONV's internal table within that CELL. The 7 th Cell has got 2 values which are required to be: one aligned to the TOP right corner of the cell and the other to the bottom right side of the cell.
Now the issue description is:
CELL-5: Contains unknown number of values for display within the cell to be positioned as MIDDLE justified for an item(POSNR) of the main item table.
CELL-7: Contains two fixed values :KWERT to be at Top right of the CELL-7and NETWR at the bottom right of CELL-7
I have managed to display the sub items in CELL-5 in the middle by giving values to SPACE BEFORE/SPACE AFTER sections of PARAGRAPH FORMAT in SMART STYLE.
Requirement: Irrespective of number of sub items in CELL-5 the 2 values in CELL- 7 to be aligned as one at the top right and the other to the bottom right cell flexibly . Same applies to other items(POSNR) of MAIN VBRP item table.
The ROWS of the MAIN Table to be flexible enough to shrink the space when then are less number of subitems in   CELL-5.
Values in Cell-7 should be adjusted according to contents in CELL-5.
Thanks in advance for your valuable suggestions.

Hi,
Do one thing while filling the 5th cell , count the number of values(counter) in it for each record and fill new internal table (Itab_cell7) as follws.
DATA : FLAG,
counter1 type n.
DO COUNTER TIMES.
counter1 = counter1 + 1.
CLEAR FLAG.
IF counter = 1.
    WA_CELL7-VALUE = KWERT.
    APPEND WA_CELL7 TO ITAB_CELL7.
    FLAG = 'X'.
ENDIF.
IF counter1 = (counter1 - 1).
    WA_CELL7-VALUE = KWERT.
    APPEND WA_CELL7 TO ITAB_CELL7.
    FLAG = 'X'.
EXIT.
ENDIF.
IF FLAG NE 'X'.
WA_CELL7-VALUE = ''.
APPEND WA_CELL7 TO ITAB_CELL7.
ENDIF.
ENDDO.
Adjust the code if need as per your data.
In cell 7 under loop node print ITAB_CELL7 values in a text. and prepare a paragragh for alingment.
if it does not work for send me the XML file for your smartform.
Regards,
Antim

How to disable the kernel SCTP deamon in Solaris 10

we have third party sctp implementation on sol8 n sol9. Now we are going to shift it to sol10. while it initiate the INIT message the kernel sctp of peer node send abort back to our node. So i need to disable kernel sctp of peer node.
thanks in advance...
twinceone

Init levels no longer form a 1:1 mapping with how SMF will run things.
The default milestone is 'all'. You can override this on the boot line. I'm not sure how you might change it within the filesystem.
Darren

My NI-6031E card acts as if the multiplexer is stuck.

Dear Folks,
My NI-6031E 64 channel card acts as if the multiplexer is sticking. I see the same voltages on little groups of 2-3 channels. All the channels are connected to inputs, and there are no overvoltages. The output impedence of the sources should be low, because its the output of an op amp from some optical sensor boxes we build in-house. So all the things listed in the knowledgebase as possible causes are eliminated. I even tried replacing the cable, based on a suggestion from someone on info-labview, but that had no effect. I also ran the on-line E-series diagnostic and all checked out fine.
Now here is the kicker. These same exact systems, same sensor hardware, same cards, have run for the last 5 years in Macs with never a sign of this problem. Now we decide to migrate to PC's running WinXP, and we run into this very large snag. Any helpful ideas out there?
Thanks,
Alvin.

AWMoore,
I have an idea about what might be happening.
When you swtiched from Mac machines to PC machines, you changed the DAQ
driver. If you currently are using NIDAQmx 7.5, your
interchannel delay will be set to a minimum value. In previous
versions of the DAQ driver the interchannel delay was set to round
robin by default which meant the interchannel delay was always the
maximum value. The decrease in interchannel delay might not be
giving your channels enough settling time and thus causing your problem.
There are a few easy ways to test this
1. Change the interchannel delay (convert clock rate) using property nodes, so that it is longer.
2. Remove every other sensor and short the input to the DAQ card it was using (the problem should go away).
To calrify, interchannel delay is the amount of time between each channel's sample.
For example (I am pretty sure you know what interchannel delay is, but
this is for other people who might be reading the posting.):
If you are using 10 analog inputs and are sampling at 10 Hz, a
sample from each channel will be taken every 100 ms. Since the
channels are multiplexed to the same ADC, the samples cannot be taken
at the exact same time. So, we have to take 10 samples one after
another within the 100 ms. In the past we would spread the 10
samples out across the entire 100 ms. This would result in a
sample taken from a channel every 10 ms. Now, we take the samples
as fast as possible. This means that at the beginning of the 100
ms period we take 10 samples with a delay of something like 5 us
between each channel. After the 10 samples are taken (this would
take 5 us times 10 samples = 50 us) the DAQ board does nothing for the
rest of the 100 ms period (99.95 ms would be left over) and then would
repleat the process for the next 100 ms period.
The new setup for the interchannel delay means there is less tolerance
for a sensor's impeadance, but it also means that the samples are taken
at much closer to the same time as one another. Like I said
above, this is just the default setting and it can be changed within
LabVIEW using property nodes (DAQmx Timing Property Node: more->AI
Convert->Rate).
This is just a shot in the dark, but it was what came to my mind
as a possible cause for your problem. I realize you said your
sensors are low impeadance, but they may not be low enough. It is
worth a try.
Lorne Hengst
Application Engineer
National Instruments

Reverse replication not implemented for action:

Hi, we see a persistent error reported in our replication log:
Reverse replication not implemented for action: TEST. Remote outbox path: /var/replication/outbox/1338320761214
The node mentioned has cq:repActionType = TEST.
The consequence is that this aborts the replication of nodes in the publisher node and our author and publisher seem out of sync because of that. Is this something anyone has come across before. How to prevent this?
If I delete that node in the outbox it seems to resolve the issue, but I'd like to know what caused it to appear in the first place to make sure we don't end up seeing this issue in our production environment.
Any help is greatly appreciated.

I don't remember exactly when the TEST nodes are created but they are only when some tests are run.
When it happened to me I had to delete once and then it never happened.
D.

CRS issue ?? and solve with restarting CRS on all DB Instances

Dear All,
i am running two node ORACLE 10G RAC ON RHEL4.
After the abrupt power off, i found one of the database instance came up properly but second instance not started.
When i run the command ./crsctl check crs, then no reply came. checked the OCSSD log and found it was printing the log "lssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(1016) LATS(121714) Disk lastSeqNo(1016)".
Tried to stop and start the crs service but nothing worked.
then i stopped the crs service on running instance. the Moment i stopped the service on running node, first node OCSSD log printed "clssnmSetupAckWait: node(2) is ALIVE". and 1st node crs service started as well instance came up.
After that i started the second instance CRS service and crs and Oracle instance started without any issue.
In this way both the database started working.
But in the above scenario, i faced the service interuption because i restarted OCS service on running instance.
Please suggest, the way i solved the issued was right or any other way is available.
For more clarification, i enclosed the OCSSD logs.
[    CSSD]2009-07-14 15:21:02.556 [1115699552] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(1044) LATS(149784) Disk lastSeqNo(1044)
[    CSSD]2009-07-14 15:21:02.899 [1220598112] >TRACE: clssnmRcfgMgrThread: Local Join
[    CSSD]2009-07-14 15:21:02.899 [1220598112] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2009-07-14 15:21:08.572 [1115699552] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(1050) LATS(155794) Disk lastSeqNo(1050)
[    CSSD]2009-07-14 15:21:09.574 [1115699552] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(1051) LATS(156794) Disk lastSeqNo(1051)
[    CSSD]2009-07-14 15:21:09.911 [1220598112] >TRACE: clssnmRcfgMgrThread: Local Join
[    CSSD]2009-07-14 15:21:09.912 [1220598112] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2009-07-14 15:21:13.584 [1115699552] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(1055) LATS(160814) Disk lastSeqNo(1055)
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmRcfgMgrThread: Local Join
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >WARNING: clssnmLocalJoinEvent: takeover succ
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmDoSyncUpdate: Initiating sync 1
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (11)
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmSendSync: syncSeqNo(1)
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[    CSSD]2009-07-14 15:21:16.924 [1147169120] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2009-07-14 15:21:16.924 [1147169120] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[etopup145] seq[1] sync[1]
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(11)
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (13)
[    CSSD]2009-07-14 15:21:16.924 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(1)
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(13)
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmCheckDskInfo: Checking disk info...
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmCheckDskInfo: diskTimeout set to (200000)ms
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmEvict: Start
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmWaitOnEvictions: Start
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (15)
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmSendUpdate: syncSeqNo(1)
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >TRACE: clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >TRACE: clssnmUpdateNodeState: node 2, state (2/3) unique (1247564029/1247564029) prevConuni(0) birth (1/1) (old/new)
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >USER: clssnmHandleUpdate: SYNC(1) from node(2) completed
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >USER: clssnmHandleUpdate: NODE 2 (etopup145) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2009-07-14 15:21:16.925 [1147169120] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(15)
[    CSSD]2009-07-14 15:21:16.925 [1220598112] >TRACE: clssnmDoSyncUpdate: Sync 1 complete!
[    CSSD]2009-07-14 15:21:16.931 [2538647328] >USER: NMEVENT_SUSPEND [00][00][00][00]
[    CSSD]2009-07-14 15:21:16.939 [1231087968] >TRACE: clssgmReconfigThread: started for reconfig (1)
[    CSSD]2009-07-14 15:21:16.939 [1231087968] >USER: NMEVENT_RECONFIG [00][00][00][04]
[    CSSD]2009-07-14 15:21:16.939 [1231087968] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 1
[    CSSD]2009-07-14 15:21:16.940 [1189128544] >TRACE: clssgmPeerListener: connects done (1/1)
[    CSSD]2009-07-14 15:21:16.940 [1231087968] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(2) birth(1)
[    CSSD]2009-07-14 15:21:16.940 [1231087968] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2009-07-14 15:21:16.940 [1231087968] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2009-07-14 15:21:16.940 [1231087968] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes
[    CSSD]CLSS-3001: local node number 2, master node number 2
[    CSSD]2009-07-14 15:21:16.940 [1231087968] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[    CSSD]2009-07-14 15:21:17.005 [1157658976] >TRACE: clsc_event_hndlr: (0x78c040) answer error, rc 15
[    CSSD]2009-07-14 15:21:17.005 [1157658976] >TRACE: clsc_event_hndlr: (0x78c040) answer error, rc 15
[    CSSD]2009-07-14 15:21:17.005 [1157658976] >TRACE: clsc_event_hndlr: (0x78c040) answer error, rc 15
[    CSSD]2009-07-14 15:21:17.034 [1157658976] >TRACE: clssgmCommonAddMember: clsomon joined (2/0x1000000/#CSS_CLSSOMON)
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmConnComplete: probe from node 1, your version: 10.2.1.2
, support PENDINA: 1
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmConnComplete: MSGSRC 1, type 5, node 1, flags 0x0001, con 0x2a9788bf40, probe (nil)
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmConnComplete: node 1, etopup144, con(0x2a9788bf40), probcon((nil)), ninfcon((nil)), node unique 1247564705, prev unique 0, msg unique 1247564705 node state 0
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmsendConnAck: node 1, node state 0
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmConnComplete: connecting to node 1 (con 0x2a9788bf40), ninfcon (0x2a9788bf40), state (0)
[    CSSD]2009-07-14 15:21:51.368 [1147169120] >TRACE: clssnmConnComplete: connected to node 1 (con 0x2a9788bf40), ninfcon (0x2a9788bf40), state (0), flag (1037)
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmDoSyncUpdate: Initiating sync 2
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (11)
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmSendSync: syncSeqNo(2)
[    CSSD]2009-07-14 15:21:51.974 [1147169120] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2009-07-14 15:21:51.974 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[    CSSD]2009-07-14 15:21:51.974 [1147169120] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[etopup145] seq[5] sync[2]
[    CSSD]2009-07-14 15:21:51.974 [2538647328] >USER: NMEVENT_SUSPEND [00][00][00][04]
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(11)
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmDoSyncUpdate: node(1) is transitioning from joining state to active state
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (13)
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2)
[    CSSD]2009-07-14 15:21:51.975 [1147169120] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(2)
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(13)
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmCheckDskInfo: Checking disk info...
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmEvict: Start
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmWaitOnEvictions: Start
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (15)
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2009-07-14 15:21:51.975 [1220598112] >TRACE: clssnmSendUpdate: syncSeqNo(2)
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >TRACE: clssnmUpdateNodeState: node 1, state (2/3) unique (1247564705/1247564705) prevConuni(0) birth (2/2) (old/new)
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >TRACE: clssnmUpdateNodeState: node 2, state (3/3) unique (1247564029/1247564029) prevConuni(0) birth (1/1) (old/new)
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >USER: clssnmHandleUpdate: SYNC(2) from node(2) completed
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >USER: clssnmHandleUpdate: NODE 1 (etopup144) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >USER: clssnmHandleUpdate: NODE 2 (etopup145) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2009-07-14 15:21:51.976 [1147169120] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2009-07-14 15:21:51.976 [1231087968] >TRACE: clssgmReconfigThread: started for reconfig (2)
[    CSSD]2009-07-14 15:21:51.976 [1231087968] >USER: NMEVENT_RECONFIG [00][00][00][06]
[    CSSD]2009-07-14 15:21:51.976 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[    CSSD]2009-07-14 15:21:51.976 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(15)
[    CSSD]2009-07-14 15:21:51.976 [1220598112] >TRACE: clssnmDoSyncUpdate: Sync 2 complete!
[    CSSD]2009-07-14 15:21:51.977 [1231087968] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 2
[    CSSD]2009-07-14 15:21:52.041 [1189128544] >TRACE: clssgmInitialRecv: (0x7dda20) accepted a new connection from node 1 born at 2 active (2, 2), vers (10,3,1,2)
[    CSSD]2009-07-14 15:21:52.041 [1189128544] >TRACE: clssgmInitialRecv: conns done (2/2)
[    CSSD]2009-07-14 15:21:52.041 [1231087968] >TRACE: clssgmEstablishMasterNode: MASTER for 2 is node(2) birth(1)
[    CSSD]2009-07-14 15:21:52.041 [1231087968] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2009-07-14 15:21:52.044 [1231087968] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 2 with 2 nodes
[    CSSD]CLSS-3001: local node number 2, master node number 2
[    CSSD]2009-07-14 15:21:52.046 [1231087968] >TRACE: clssgmReconfigThread: completed for reconfig(2), with status(1
Edited by: Sumit2 on Jul 14, 2009 4:08 PM

- Check /tmp/crsctl.* file. while start Cluster "crsctl start crs"
If see... check in files ;)
- Check /var/log/messages
Clusterware Fails to Start With 'clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk' in Ocssd.logmetalink: Doc ID:     845573.1
Node02 is running the Oracle Clusterware stack on top of Veritas but node01 is running a
completely Oracle stack.
1. Shutdown the Oracle Clusterware stack on node01
2. Change the link to point to /opt/ORCLcluster/lib/libskgxn2.so
3. Restart the Oracle Clusterware stack on node01
4. Shutdown and restart the Oracle Clusterware stack on node02

OPS on Dell Hardware or Other

We are considering installation of a department sized Oracle Parallel Server using PowerEdge 2450 servers and a PowerVault enclosure. I would be extremely interested in any successes/failures at all with OPS on linux with any hardware vendor. There is a huge lack of information to be found about OPS on linux.

only issue I have so far with OPS on Linux >is
the oracm (cluster manager) and oranm (node >monitor) processes seem to die >occaisionally with a shutdown abort...so >the node has to be rebooted...(they don't >want to come up other wise)in the admin guide of 8.1.7.0.1 for linux,
it said shutdown abort will cause a node reset. that is normal.
I am really looking forward to the Oracle 9i
Real Application Cluster for Linux,'coz in the presentation I've seen,it said write-to-write lock collision will be handled by cache
fusion now,that means u don't have to do the
partitioning anymore to get parallel server benefit.
Thta's very interesting,I would like to see how they work.

[FPGA] Loop rate very slow: Do FPGA I/O nodes in parallel loops block each other?

Hi,
I am using cRIO-9075. Mod1 is NI 9263, Mod2 is NI 9227, Mod3 is 9215.
Please see my VI attached or the given screenshot.
The FPGA code is based on the "NI CompactRIO Waveform Reference Library" (it's the lower loop).
The upper loop was added by me and is writing a waveform from blockmemory to the NI 9263 module (Mod 1).
The data sampled in the lower loop is running at 1 kHz. The control "AO Update Period" for the upper loop has a value of (for example) 10 (=uS).
The problem is, that this loop is running much much slower than it should. Once I disable the FPGA I/O node in the lower loop (as done in the attachments), it's running as fast as it should.
It seems to me, that the FPGA I/O nodes are blocking each other. I tried to figure it out by reading through serveral NI documents, but until now I have no idea how to solve that.
Can you give me some advices? Some general tipps about the VI?
Thanks!
Attachments:
FPGA Loop Rate.PNG ‏72 KB
FPGA Main.vi ‏251 KB

Hi, thanks so far.
Originally the control was inside the loop. Then I tried if it makes a difference if it's outside.
Ok, i really seems to be that default value of "100000" for "AO Update Period".
Starting the VI directly woks like expected. Having "AO Update period" inside the loop makes it possible to control it as it's running.
But, please see the attachment. When starting the FPGA through RT and setting the appropiate value, it does not seem to work. The oscilloscope show's the same behavior like "AO Update Period" was 100000.
But when reading the value of "AO Update Period" afterwards (while the FPGA is running), it shows the expected value of "10".
Having changed the default value to 10 works so far, but I am not able to changed it (see attachment).
So the problem is: Why is "Read/Write control" not working here? Why is still the default value used?
Attachments:
FPGA Loop Rate 2.PNG ‏5 KB

Is this how i control frame rate with IMAQdr property node?

Is this how i control frame rate with IMAQdr property node? if not, can someone point me in the right direction? for some reason, it doesnt work..
Attachments:
pic.JPG ‏75 KB

Please see this related thread http://forums.ni.com/t5/LabVIEW/IMAQdr-Property-Node/m-p/1642950/highlight/false#M590168
Matt
Product Owner - NI Community
National Instruments

Election problem after repeated split-brains with two nodes

Hi
I'm using a customized source based on BDB-5.1.19 (excxx_repquote)
with two site one - MASTER and the other SLAVE...
nsite=2
ack=quorum
- the master is writing to quotedb at a rate of 10 txn per sec
- the test consist to isolate the client from the master (split brain) and reconnect it after a random time include from 1sec to 10sec
the test run well about 10 times but at a moment the process slave receive DB_EVENT_REP_ELECTION_FAILED
and the master enter in election mode and never exit from the CLIENT mode. I must say that to freeze the client I decide to kill me (kill -9 my pid) when I receive such event...
here is the verbose log on the master...
[1307872770:871621][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872770:973655][6510/47655809107168] MASTER: bulk_msg: Send buffer after copy due to PERM
[1307872770:973667][6510/47655809107168] MASTER: send_bulk: Send 266 (0x10a) bulk buffer bytes
[1307872770:973672][6510/47655809107168] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type bulk_log, LSN [21][986648] perm
[1307872770:973693][6510/47655809107168] MASTER: will await acknowledgement: need 1
[1307872771:26623][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872771:126380][6510/1162996032] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type log, LSN [21][946345]
[1307872771:126407][6510/1162996032] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type dupmaster, LSN [0][0] nobuf
[1307872771:126695][6510/1162996032] MASTER: rep_start: Found old version log 17
[1307872771:126753][6510/1162996032] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type newclient, LSN [0][0] nobuf
[1307872771:126833][6510/1183975744] CLIENT: starting election thread
[1307872771:126876][6510/1183975744] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:126890][6510/1183975744] CLIENT: Election thread owns egen 69
[1307872771:127423][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type newclient, LSN [0][0]
[1307872771:130079][6510/1183975744] CLIENT: Tallying VOTE1[0] (2147483647, 69)
[1307872771:130113][6510/1183975744] CLIENT: Beginning an election
[1307872771:130134][6510/1183975744] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:130147][6510/1173485888] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type master_req, LSN [0][0] nobuf
[1307872771:130438][6510/1152506176] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][946437]
[1307872771:130460][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728]
[1307872771:130467][6510/1152506176] CLIENT: Updating gen from 68 to 70
[1307872771:130482][6510/1162996032] CLIENT: Received ALIVE egen of 71, mine 69
[1307872771:130503][6510/1162996032] CLIENT: Election finished in 0.003602000 sec
[1307872771:130515][6510/1162996032] CLIENT: Election done; egen 70
[1307872771:130534][6510/1152506176] CLIENT: Received vote1 egen 71, egen 71
[1307872771:130581][6510/1152506176] CLIENT: Tallying VOTE1[0] (0, 71)
[1307872771:130593][6510/1089075520] CLIENT: starting election thread
[1307872771:130619][6510/1152506176] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,946437]
[1307872771:130642][6510/1152506176] CLIENT: Not in election, but received vote1 0x282c 0x8
[1307872771:130674][6510/1089075520] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:130692][6510/1089075520] CLIENT: Election thread owns egen 71
[1307872771:130704][6510/1194465600] CLIENT: starting election thread
[1307872771:130733][6510/1194465600] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:132922][6510/1089075520] CLIENT: Tallying VOTE1[1] (2147483647, 71)
[1307872771:132949][6510/1089075520] CLIENT: Accepting new vote
[1307872771:132958][6510/1089075520] CLIENT: Beginning an election
[1307872771:132973][6510/1089075520] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:132985][6510/1194465600] CLIENT: election thread is exiting
[1307872771:133012][6510/1089075520] CLIENT: Tallying VOTE2[0] (2147483647, 71)
[1307872771:133037][6510/1089075520] CLIENT: Counted my vote 1
[1307872771:133048][6510/1089075520] CLIENT: Skipping phase2 wait: already got 1 votes
[1307872771:133060][6510/1089075520] CLIENT: Got enough votes to win; election done; (prev) gen 70
[1307872771:133071][6510/1089075520] CLIENT: Election finished in 0.002367000 sec
[1307872771:133084][6510/1089075520] CLIENT: Election done; egen 72
[1307872771:133111][6510/1089075520] CLIENT: Ended election with 0, e_th 1, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
[1307872771:133170][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0]
[1307872771:133187][6510/1173485888] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:173744][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0]
[1307872771:173769][6510/1162996032] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:231593][6510/1183975744] CLIENT: Ended election with 0, e_th 0, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
[1307872771:231629][6510/1183975744] CLIENT: election thread is exiting
[1307872777:443794][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307872971:644194][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873165:844583][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873360:44955][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873554:245347][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873748:445736][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873942:646117][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307874136:846509][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
.... and infinite stay to this situation
My question is why the Master is suddenly transformed into CLIENT and why it's never returning to the MASTER
Thanks in advance ...
here is the log for the client
[1307872315:455113][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984396]
[1307872315:455134][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984483] perm
[1307872315:609962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984733] perm
[1307872315:764958][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984986] perm
[1307872315:919962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985238] perm
[1307872316:75018][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985491] perm
[1307872316:229959][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985741] perm
[1307872316:384949][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985993] perm
[1307872316:499899][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986141] perm
[1307872316:539895][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986221]
[1307872316:540078][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986307]
[1307872316:540100][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986394] perm
[1307872316:694950][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986648] perm
[1307872316:847349][1282/1129134400] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type log, LSN [21][946345]
[1307872316:847698][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type dupmaster, LSN [0][0]
[1307872316:847999][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type newclient, LSN [0][0]
[1307872316:848168][1282/1171093824] MASTER: rep_start: Found old version log 17
[1307872316:848222][1282/1181583680] CLIENT: Racing replication msg lockout, ignore message.
[1307872316:848398][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type newclient, LSN [0][0] nobuf
[1307872316:848504][1282/1192073536] CLIENT: starting election thread
[1307872316:848542][1282/1192073536] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872316:848566][1282/1192073536] CLIENT: Election thread owns egen 71
[1307872316:849634][1282/1192073536] CLIENT: Tallying VOTE1[0] (2147483647, 71)
[1307872316:849654][1282/1192073536] CLIENT: Beginning an election
[1307872316:849680][1282/1192073536] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][946437] nobuf
[1307872316:851403][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type vote1, LSN [21][986728]
[1307872316:851448][1282/1160603968] CLIENT: Received vote1 egen 69, egen 71
[1307872316:851470][1282/1160603968] CLIENT: Received old vote 69, egen 71, ignoring vote1
[1307872316:851481][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728] nobuf
[1307872316:851538][1282/1171093824] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type master_req, LSN [0][0]
[1307872316:851558][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0] nobuf
[1307872316:854254][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][986728]
[1307872316:854275][1282/1160603968] CLIENT: Received vote1 egen 71, egen 71
[1307872316:854317][1282/1160603968] CLIENT: Tallying VOTE1[1] (0, 71)
[1307872316:854339][1282/1160603968] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,986728]
[1307872316:854353][1282/1160603968] CLIENT: Existing vote: (eid)2147483647 (pri)100 (gen)70 (sites)2 [21,946437]
[1307872316:854369][1282/1160603968] CLIENT: Accepting new vote
[1307872316:854379][1282/1160603968] CLIENT: Phase1 election done
[1307872316:854395][1282/1160603968] CLIENT: Voting for 0
[1307872316:854407][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0] nobuf
[1307872317:960344][1282/1192073536] CLIENT: After phase 2: votes 0, nvotes 1, nsites 2
[1307872317:960389][1282/1192073536] CLIENT: Election finished in 1.111809000 sec
[1307872317:960401][1282/1192073536] CLIENT: Election done; egen 72
[1307872317:960412][1282/1192073536] CLIENT: Ended election with -30974, e_th 0, egen 72, flag 0x282c, e_fl 0x0, lo_fl 0x0
Kill me !!
--- my source
on the master I run manually :
txn_rate 1
loop_rate 10
loop 1 20000
* See the file LICENSE for redistribution information.
* Copyright (c) 2001, 2010 Oracle and/or its affiliates. All rights reserved.
* $Id$
* In this application, we specify all communication via the command line. In
* a real application, we would expect that information about the other sites
* in the system would be maintained in some sort of configuration file. The
* critical part of this interface is that we assume at startup that we can
* find out
*      1) what our Berkeley DB home environment is,
*      2) what host/port we wish to listen on for connections; and
*      3) an optional list of other sites we should attempt to connect to.
* These pieces of information are expressed by the following flags.
* -h home (required; h stands for home directory)
* -l host:port (required; l stands for local)
* -C or -M (optional; start up as client or master)
* -r host:port (optional; r stands for remote; any number of these may be
*     specified)
* -R host:port (optional; R stands for remote peer; only one of these may
* be specified)
* -a all|quorum (optional; a stands for ack policy)
* -b (optional; b stands for bulk)
* -n nsites (optional; number of sites in replication group; defaults to 0
*     to try to dynamically compute nsites)
* -p priority (optional; defaults to 100)
* -v (optional; v stands for verbose)
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <string>
#include <sstream>
#include <sys/types.h>
#include <signal.h>
#include <db_cxx.h>
#include "RepConfigInfo.h"
#include "dbc_auto.h"
using std::cout;
using std::cin;
using std::cerr;
using std::endl;
using std::ends;
using std::flush;
using std::istream;
using std::istringstream;
using std::ostringstream;
using std::string;
using std::getline;
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#define     CACHESIZE     (10 * 1024 * 1024)
#define     DATABASE     "quote.db"
#define     DATABASE2     "quote2.db"
const char *progname = "excxx_repquote";
#include <errno.h>
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#define     snprintf          _snprintf
#define     sleep(s)          Sleep(1000 * (s))
extern "C" {
extern int getopt(int, char * const *, const char *);
extern char *optarg;
typedef HANDLE thread_t;
typedef DWORD thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
(((*(thrp) = CreateThread(NULL, 0,                         \
     (LPTHREAD_START_ROUTINE)(func), (arg), 0, NULL)) == NULL) ? -1 : 0)
#define     thread_join(thr, statusp)                         \
((WaitForSingleObject((thr), INFINITE) == WAIT_OBJECT_0) &&          \
GetExitCodeThread((thr), (LPDWORD)(statusp)) ? 0 : -1)
#else /* !_WIN32 */
#include <pthread.h>
typedef pthread_t thread_t;
typedef void* thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
pthread_create((thrp), (attr), (func), (arg))
#define     thread_join(thr, statusp) pthread_join((thr), (statusp))
#endif
// Struct used to store information in Db app_private field.
typedef struct {
     bool app_finished;
     bool in_client_sync;
     bool is_master;
     bool no_dummy_wr;
} APP_DATA;
static void log(const char *);
void checkpoint_thread (void );
void log_archive_thread (void );
void dummy_write_thread (void );
class RepQuoteExample {
public:
     RepQuoteExample();
     void init(RepConfigInfo* config);
     void doloop();
     int terminate();
     static void event_callback(DbEnv* dbenv, u_int32_t which, void *info);
     void print_stocks_size(Db *dbp);
private:
     // disable copy constructor.
     RepQuoteExample(const RepQuoteExample &);
     void operator = (const RepQuoteExample &);
     // internal data members.
     APP_DATA          app_data;
     RepConfigInfo *app_config;
     DbEnv          cur_env;
     thread_t ckp_thr;
     thread_t lga_thr;
     thread_t dmy_thr;
     // private methods.
     void print_stocks(Db *dbp);
     void print_env(DbEnv *dbenv);
     void prompt();
RepQuoteExample *g_runner=NULL;
RepConfigInfo *g_config=NULL;
class DbHolder {
public:
     DbHolder(DbEnv env, const char _dbname) : env(env)
          dbp = 0;
          if (_dbname) dbname=_dbname;
          else dbname=DATABASE;
     ~DbHolder() {
     try {
          close();
     } catch (...) {
          // Ignore: this may mean another exception is pending
     bool ensure_open(bool creating) {
     if (dbp)
          return (true);
     dbp = new Db(env, 0);
     u_int32_t flags = DB_AUTO_COMMIT;
     if (creating)
          flags |= DB_CREATE;
     try {
          //dbp->open(NULL, DATABASE, NULL, DB_BTREE, flags, 0);
          //dbp->open(NULL, dbname, NULL, DB_BTREE, flags, 0);
          dbp->open(NULL, NULL, dbname, DB_BTREE, flags, 0);
          return (true);
     } catch (DbDeadlockException e) {
     } catch (DbRepHandleDeadException e) {
     } catch (DbException e) {
          if (e.get_errno() == DB_REP_LOCKOUT) {
          // Just fall through.
          } else if (e.get_errno() == ENOENT && !creating) {
          // Provide a bit of extra explanation.
          log("Stock DB does not yet exist");
          } else
          throw;
     // (All retryable errors fall through to here.)
     log("please retry the operation");
     close();
     return (false);
     void close() {
     if (dbp) {
          try {
          dbp->close(0);
          delete dbp;
          dbp = 0;
          } catch (...) {
          delete dbp;
          dbp = 0;
          throw;
     operator Db *() {
     return dbp;
     Db *operator->() {
     return dbp;
private:
     Db *dbp;
     DbEnv *env;
     const char *dbname;
class StringDbt : public Dbt {
public:
#define GET_STRING_OK 0
#define GET_STRING_INVALID_PARAM 1
#define GET_STRING_SMALL_BUFFER 2
#define GET_STRING_EMPTY_DATA 3
     int get_string(char **buf, size_t buf_len)
          size_t copy_len;
          int ret = GET_STRING_OK;
          if (buf == NULL) {
               cerr << "Invalid input buffer to get_string" << endl;
               return GET_STRING_INVALID_PARAM;
          // make sure the string is null terminated.
          memset(*buf, 0, buf_len);
          // if there is no string, just return.
          if (get_data() == NULL || get_size() == 0)
               return GET_STRING_OK;
          if (get_size() >= buf_len) {
               ret = GET_STRING_SMALL_BUFFER;
               copy_len = buf_len - 1; // save room for a terminator.
          } else
               copy_len = get_size();
          memcpy(*buf, get_data(), copy_len);
          return ret;
     size_t get_string_length()
          if (get_size() == 0)
               return 0;
          return strlen((char *)get_data());
     void set_string(char *string)
          set_data(string);
          set_size((u_int32_t)strlen(string));
     StringDbt(char *string) :
     Dbt(string, (u_int32_t)strlen(string)) {};
     StringDbt() : Dbt() {};
     ~StringDbt() {};
     // Don't add extra data to this sub-class since we want it to remain
     // compatible with Dbt objects created internally by Berkeley DB.
Db *g_repquote=NULL;
RepQuoteExample::RepQuoteExample() : app_config(0), cur_env(0) {
     app_data.app_finished = 0;
     app_data.in_client_sync = 0;
     app_data.is_master = 0; // assume I start out as client
     app_data.no_dummy_wr = 0 ; //prevent to run dummy write
int (*old_rep_process_message)
          __P((DB_ENV *, DBT *, DBT *, int, DB_LSN *));
int my_rep_process_message __P((DB_ENV arg1, DBT arg2, DBT arg3, int arg4, DB_LSN arg5))
     printf("EZ->>> my_rep_process_message:%p\n",arg5);
     old_rep_process_message(arg1,arg2,arg3,arg4,arg5);
void RepQuoteExample::init(RepConfigInfo *config) {
     app_config = config;
     cur_env.set_app_private(&app_data);
     cur_env.set_errfile(stderr);
     app_data.no_dummy_wr=config->no_dummy_wr;
     if (app_data.no_dummy_wr)
          printf("No dummy !!!\n");
     //EZ->cur_env.set_errpfx(progname);
     cur_env.set_event_notify(event_callback);
     // Configure bulk transfer to send groups of records to clients
     // in a single network transfer. This is useful for master sites
     // and clients participating in client-to-client synchronization.
     if (app_config->bulk)
          cur_env.rep_set_config(DB_REP_CONF_BULK, 1);
     // Set the total number of sites in the replication group.
     // This is used by repmgr internal election processing.
     if (app_config->totalsites > 0)
          cur_env.rep_set_nsites(app_config->totalsites);
     // Turn on debugging and informational output if requested.
     if (app_config->verbose)
          cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     cur_env.set_verbose(DB_VERB_RECOVERY, 1);
     cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
     cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
     cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     // Set replication group election priority for this environment.
     // An election first selects the site with the most recent log
     // records as the new master. If multiple sites have the most
     // recent log records, the site with the highest priority value
     // is selected as master.
     cur_env.rep_set_priority(app_config->priority);
     // Set the policy that determines how master and client sites
     // handle acknowledgement of replication messages needed for
     // permanent records. The default policy of "quorum" requires only
     // a quorum of electable peers sufficient to ensure a permanent
     // record remains durable if an election is held. The "all" option
     // requires all clients to acknowledge a permanent replication
     // message instead.
     cur_env.repmgr_set_ack_policy(app_config->ack_policy);
     // Set the threshold for the minimum and maximum time the client
     // waits before requesting retransmission of a missing message.
     // Base these values on the performance and load characteristics
     // of the master and client host platforms as well as the round
     // trip message time.
     cur_env.rep_set_request(20000, 500000);
     // Configure deadlock detection to ensure that any deadlocks
     // are broken by having one of the conflicting lock requests
     // rejected. DB_LOCK_DEFAULT uses the lock policy specified
     // at environment creation time or DB_LOCK_RANDOM if none was
     // specified.
     cur_env.set_lk_detect(DB_LOCK_DEFAULT);
     // The following base replication features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Master leases: Provide stricter consistency for data reads
     // on a master site.
     // - Timeouts: Customize the amount of time Berkeley DB waits
     // for such things as an election to be concluded or a master
     // lease to be granted.
     // - Delayed client synchronization: Manage the master site's
     // resources by spreading out resource-intensive client
     // synchronizations.
     // - Blocked client operations: Return immediately with an error
     // instead of waiting indefinitely if a client operation is
     // blocked by an ongoing client synchronization.
     cur_env.repmgr_set_local_site(app_config->this_host.host,
     app_config->this_host.port, 0);
     for ( REP_HOST_INFO *cur = app_config->other_hosts; cur != NULL;
          cur = cur->next) {
          cur_env.repmgr_add_remote_site(cur->host, cur->port,
          NULL, cur->peer ? DB_REPMGR_PEER : 0);
     // Configure heartbeat timeouts so that repmgr monitors the
     // health of the TCP connection. Master sites broadcast a heartbeat
     // at the frequency specified by the DB_REP_HEARTBEAT_SEND timeout.
     // Client sites wait for message activity the length of the
     // DB_REP_HEARTBEAT_MONITOR timeout before concluding that the
     // connection to the master is lost. The DB_REP_HEARTBEAT_MONITOR
     // timeout should be longer than the DB_REP_HEARTBEAT_SEND timeout.
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_SEND, 5000000);
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 10000000);
     // The following repmgr features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Two-site strict majority rule - In a two-site replication
     // group, require both sites to be available to elect a new
     // master.
     // - Timeouts - Customize the amount of time repmgr waits
     // for such things as waiting for acknowledgements or attempting
     // to reconnect to other sites.
     // - Site list - return a list of sites currently known to repmgr.
     // We can now open our environment, although we're not ready to
     // begin replicating. However, we want to have a dbenv around
     // so that we can send it into any of our message handlers.
     cur_env.set_cachesize(0, CACHESIZE, 0);
     cur_env.set_flags(DB_REP_PERMANENT, 1);
     //cur_env.set_flags(DB_TXN_WRITE_NOSYNC, 1);
/*     u_int32_t maxlocks=300000;
     if (maxlocks != 0)
          cur_env.set_lk_max_locks(maxlocks);
     u_int32_t maxlocks_o=300000;
     if (maxlocks_o != 0)
          cur_env.set_lk_max_objects(maxlocks_o);
     u_int32_t maxmutex=300000;
     if (maxmutex != 0)
          cur_env.mutex_set_max(maxmutex);
     DbEnv          *m_env=&cur_env;
     m_env->set_flags(DB_TXN_NOSYNC, 1);
     m_env->set_lk_max_lockers(60000);
     m_env->set_lk_max_objects(60000);
     m_env->set_lk_max_locks(60000);
     m_env->set_tx_max(60000);
     //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
     m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
     m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
     //m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
     m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 1 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_FULL_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 5 * 1000 * 1000);
     //m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 500 * 1000); //500 milli seconds
     //The minimum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_min = 40000; //40 000 microsec = 40 mili
     //The maximum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_max = 1280000;// 1 280 000 microsec = 1.28 sec
     u_int32_t rep_limit_gbytes = 0;
     u_int32_t rep_limit_bytes = 100 * 1024 * 1024; // 100MB
     m_env->rep_set_request(rep_req_min, rep_req_max);
     m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);
     cur_env.open(app_config->home, DB_CREATE | DB_RECOVER |
     DB_THREAD | DB_INIT_REP | DB_INIT_LOCK | DB_INIT_LOG |
     DB_INIT_MPOOL | DB_INIT_TXN , 0);
     //keep old function for chain
     //old_rep_process_message=cur_env.get_DB_ENV()->rep_process_message;
     //derouting
     //cur_env.get_DB_ENV()->rep_process_message=my_rep_process_message;
     /*int _i;
     cur_env.log_get_config(DB_LOG_DIRECT, &_i);printf ("DB_LOG_DIRECT = %d\n",_i);
     cur_env.log_get_config(DB_LOG_DSYNC, &_i);printf ("DB_LOG_DSYNC = %d\n",_i);
     cur_env.log_get_config(DB_LOG_AUTO_REMOVE, &_i);printf ("DB_LOG_AUTO_REMOVE = %d\n",_i);
     cur_env.log_get_config(DB_LOG_IN_MEMORY, &_i);printf ("DB_LOG_IN_MEMORY = %d\n",_i);
     cur_env.log_get_config(DB_LOG_ZERO,&_i);printf ("DB_LOG_ZERO = %d\n",_i);
     // Start checkpoint and log archive support threads.
     (void)thread_create(&ckp_thr, NULL, checkpoint_thread, &cur_env);
     (void)thread_create(&lga_thr, NULL, log_archive_thread, &cur_env);
     (void)thread_create(&dmy_thr, NULL, dummy_write_thread, &cur_env);
     cur_env.repmgr_start(3, app_config->start_policy);
}

int RepQuoteExample::terminate() {
     try {
          // Wait for checkpoint and log archive threads to finish.
          // Windows does not allow NULL pointer for exit code variable.
          thread_exit_status_t exstat;
          (void)thread_join(lga_thr, &exstat);
          (void)thread_join(ckp_thr, &exstat);
          (void)thread_join(dmy_thr, &exstat);
          // We have used the DB_TXN_NOSYNC environment flag for
          // improved performance without the usual sacrifice of
          // transactional durability, as discussed in the
          // "Transactional guarantees" page of the Reference
          // Guide: if one replication site crashes, we can
          // expect the data to exist at another site. However,
          // in case we shut down all sites gracefully, we push
          // out the end of the log here so that the most
          // recent transactions don't mysteriously disappear.
          cur_env.log_flush(NULL);
          cur_env.close(0);
     } catch (DbException dbe) {
          cout << "error closing environment: " << dbe.what() << endl;
     return 0;
void RepQuoteExample::prompt() {
     cout << "QUOTESERVER";
     if (!app_data.is_master)
          cout << "(read-only)";
     cout << "> " << flush;
void log(const char *msg) {
time_t currentTime;
// get and print the current time
time (&currentTime); // fill now with the current time
     char buff[255];
     strncpy(buff,ctime(&currentTime),sizeof(buff));
     char *p;
     for(p =buff ; *p != '\n'; p++);
     *p = '\0';
     cerr << buff << " - " << msg << endl;
// Simple command-line user interface:
// - enter "<stock symbol> <price>" to insert or update a record in the
//     database;
// - just press Return (i.e., blank input line) to print out the contents of
//     the database;
// - enter "quit" or "exit" to quit.
void RepQuoteExample::doloop() {
     DbHolder dbh1(&cur_env,DATABASE);
     DbHolder dbh2(&cur_env,DATABASE2);
     DbHolder *dbh=&dbh1;
     DbTxn *txn;
     string input;
bool truncate = false;
     char *c;
     using_history();
     g_repquote=*dbh;
     int loop_rate = 0;
     int txn_rate = 500;
     while (prompt(), /*getline(cin, input)*/c=readline(NULL)) {
          input=std::string(c);
          add_history(c);
          free(c);
          int start_loop = 0;
          int end_loop = 0;
          int start_loop_d = 0;
          int end_loop_d = 0;
          istringstream is(input);
          string token1, token2, token3;
truncate = false;
start_loop = 0;
end_loop = 0;
          // Read 0, 1 or 2 tokens from the input.
          int count = 0;
          if (is >> token1) {
               count++;
               if (is >> token2)
               count++;
               if (is >> token3)
               count++;
          if (count == 1) {
     if (token1 == "truncate" ) {
                    truncate = true;
               else if (token1 == "env" ){
                    print_env(&cur_env);
                    continue;
     else if (token1 == "verbose" ) {
                    app_config->verbose = !app_config->verbose;
                    if (app_config->verbose)
                         cur_env.set_verbose(DB_VERB_REPLICATION, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 1);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         log("verbose is on");
                    else
                         cur_env.set_verbose(DB_VERB_REPLICATION, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 0);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 0);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 0);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         log("verbose is off");
                    continue;
     else if (token1 == "print" ) {
               print_stocks(*dbh);
                    count = 0;
     else if (token1 == "db1" ) {
                    dbh=&dbh1;
                    g_repquote=*dbh;
                    log( "switch to Db1");
                    count = 0;
     else if (token1 == "db2" ) {
                    dbh=&dbh2;
                    g_repquote=*dbh;
                    log( "switch to Db2");
                    count = 0;
               else if (token1 == "exit" || token1 == "quit") {
                    app_data.app_finished = 1;
                    break;
               } else {
                    log("Format: <stock> <price>");
                    continue;
else if (count == 2)
               if (token1 == "loop_rate" ){
     loop_rate = atoi(token2.c_str());
                    continue;
               if (token1 == "txn_rate" ){
     txn_rate = atoi(token2.c_str());
                    continue;
else if (count == 3)
if (token1 == "loop" ) {
start_loop = atoi(token2.c_str());
end_loop = start_loop + atoi(token3.c_str());
if (token1 == "delete" ) {
start_loop_d = atoi(token2.c_str());
end_loop_d = start_loop_d + atoi(token3.c_str());
          // Here we know count is either 0 or 2, so we're about to try a
          // DB operation.
          // Open database with DB_CREATE only if this is a master
          // database. A client database uses polling to attempt
          // to open the database without DB_CREATE until it is
          // successful.
          // This DB_CREATE polling logic can be simplified under
          // some circumstances. For example, if the application can
          // be sure a database is already there, it would never need
          // to open it with DB_CREATE.
          if (!dbh->ensure_open(app_data.is_master))
               continue;
          try {
               if (count == 0)
                    if (app_data.in_client_sync)
                         log( "Cannot read data during client initialization - please try again.");
                    else
                         print_stocks_size(*dbh);
               else if (!app_data.is_master)
                    log("Can't update at client");
               else {
                    if (truncate)
u_int32_t no_remove;
                    txn = NULL;
cur_env.txn_begin(NULL, &txn, DB_TXN_NOWAIT);
                         try
          (*dbh)->truncate(txn, &no_remove, 0);
// commit
txn->commit(0);
txn = NULL;
} catch (DbException &e) {
std::cout << "Error on txn commit: " << e.what() << std::endl;
                    //     } catch (DbDeadlockException &) {
                    if (txn != NULL)
                         (void)txn->abort();
// std::cout << "Error on txn commit: " << std::endl;
else if (start_loop)
int j=0;
for (int i=start_loop; i<=end_loop; i=i+txn_rate)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop && j<=(i+txn_rate); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     sstrm.str("");
     int payload = rand() + j;
                              sstrm << "price" << payload << ends;
     value1 = sstrm.str();
               value.set_data((void *)value1.c_str());
               value.set_size((u_int32_t)strlen(value1.c_str()));
     // Perform the database put
     (*dbh)->put(txn, &key, &value, 0);
                         printf("Kill me !!\n");
                         kill(getpid(),-9);
                         exit(0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                         if (loop_rate>0)
                              usleep(txn_rate * 1000 * 1000 / loop_rate);
                    else if (start_loop_d)
int j=0;
for (int i=start_loop_d; i<=end_loop_d; i=i+100)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop_d && j<=(i+100); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     // Perform the database put
     (*dbh)->del(txn, &key, 0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                    else
                         const char *symbol = token1.c_str();
                         StringDbt key(const_cast<char*>(symbol));
                         const char *price = token2.c_str();
                         StringDbt data(const_cast<char*>(price));
                         (*dbh)->put(NULL, &key, &data, 0);
          } catch (DbDeadlockException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbRepHandleDeadException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbException e) {
               if (e.get_errno() == DB_REP_LOCKOUT) {
               log("please retry the operation");
               dbh->close();
               } else
               throw;
     dbh->close();
void RepQuoteExample::event_callback(DbEnv* dbenv, u_int32_t which, void *info)
     static char buf[256];
     APP_DATA app = (APP_DATA)dbenv->get_app_private();
     info = NULL;          /* Currently unused. */
     switch (which) {
     case DB_EVENT_REP_CLIENT:
          app->is_master = 0;
          app->in_client_sync = 1;
          sprintf(buf,"%s - %s",progname,"CLIENT");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_CLIENT.");
          break;
     case DB_EVENT_REP_MASTER:
          app->is_master = 1;
          app->in_client_sync = 0;
          sprintf(buf,"%s - %s",progname,"MASTER");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_MASTER.");
          break;
     case DB_EVENT_REP_NEWMASTER:
          log("DB_EVENT_REP_NEWMASTER.");
          app->in_client_sync = 1;
          break;
     case DB_EVENT_REP_PERM_FAILED:
          // Did not get enough acks to guarantee transaction
          // durability based on the configured ack policy. This
          // transaction will be flushed to the master site's
          // local disk storage for durability.
          log("DB_EVENT_REP_PERM_FAILED.");
          log("Insufficient acknowledgements to guarantee transaction durability.");
          break;
     case DB_EVENT_REP_STARTUPDONE:
          app->in_client_sync = 0;
          log("DB_EVENT_REP_STARTUPDONE.");
          break;
     case DB_EVENT_REP_ELECTION_FAILED:
          log("DB_EVENT_REP_ELECTION_FAILED.");
          //g_runner->init(g_config);
          printf("Kill me !!\n");
          kill(getpid(),-9);
          exit(0);
          break;
     case DB_EVENT_REP_DUPMASTER:
          log("DB_EVENT_REP_DUPMASTER.");
          break;
     default:
          dbenv->errx("ignoring event %d", which);
void RepQuoteExample::print_stocks_size(Db *dbp) {
     DB_BTREE_STAT *statp;
dbp->stat(NULL, &statp, 0);
     log("db_stat");
cout << "***************************************** >>>>>>>>>>> : database contains " << (u_long)statp->bt_ndata << " records\n";
void RepQuoteExample::print_env(DbEnv *dbenv) {
     dbenv->stat_print(DB_STAT_ALL);
void RepQuoteExample::print_stocks(Db *dbp) {
     StringDbt key, data;
#define     MAXKEYSIZE     10
#define     MAXDATASIZE     20
     char keybuf[MAXKEYSIZE + 1], databuf[MAXDATASIZE + 1];
     char kbuf, dbuf;
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     kbuf = keybuf;
     dbuf = databuf;
     DbcAuto dbc(dbp, 0, 0);
     cout << "\tSymbol\tPrice" << endl
          << "\t======\t=====" << endl;
int no_records =0;
     for (int ret = dbc->get(&key, &data, DB_FIRST);
          ret == 0;
          ret = dbc->get(&key, &data, DB_NEXT)) {
          key.get_string(&kbuf, MAXKEYSIZE);
          data.get_string(&dbuf, MAXDATASIZE);
no_records++;
          cout << "\t" << keybuf << "\t" << databuf << endl;
cout << "********************** NO Records " << no_records << endl;
     cout << endl << flush;
     dbc.close();
static void usage() {
     cerr << "usage: " << progname << " -h home -l host:port [-CM]"
     << "[-r host:port][-R host:port]" << endl
     << " [-a all|quorum][-b][-n nsites][-p priority][-v]" << endl;
     cerr << "\t -h home (required; h stands for home directory)" << endl
     << "\t -l host:port (required; l stands for local)" << endl
     << "\t -C or -M (optional; start up as client or master)" << endl
     << "\t -r host:port (optional; r stands for remote; any "
     << "number of these" << endl
     << "\t may be specified)" << endl
     << "\t -R host:port (optional; R stands for remote peer; only "
     << "one of" << endl
     << "\t these may be specified)" << endl
     << "\t -a all|quorum (optional; a stands for ack policy)" << endl
     << "\t -b (optional; b stands for bulk)" << endl
     << "\t -n nsites (optional; number of sites in replication "
     << "group; defaults " << endl
     << "\t     to 0 to try to dynamically compute nsites)" << endl
     << "\t -p priority (optional; defaults to 100)" << endl
     << "\t -v (optional; v stands for verbose)" << endl;
     exit(EXIT_FAILURE);
int main(int argc, char **argv) {
     RepConfigInfo config;
     char ch, portstr, tmphost;
     int tmpport;
     bool tmppeer;
     config.no_dummy_wr = false;
     // Extract the command line parameters
     while ((ch = getopt(argc, argv, "E:a:bCh:l:Mn:p:R:r:vw")) != EOF) {
          tmppeer = false;
          switch (ch) {
          case 'a':
               if (strncmp(optarg, "all", 3) == 0)
                    config.ack_policy = DB_REPMGR_ACKS_ALL;
               else if (strncmp(optarg, "quorum", 6) != 0)
                    usage();
               break;
          case 'b':
               config.bulk = true;
               break;
          case 'C':
               config.start_policy = DB_REP_CLIENT;
               break;
          case 'E':
config.start_policy = DB_REP_ELECTION;
break;
          case 'h':
               config.home = optarg;
               break;
          case 'l':
               config.this_host.host = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               config.this_host.port = (unsigned short)atoi(portstr);
               config.got_listen_address = true;
               break;
          case 'M':
               config.start_policy = DB_REP_MASTER;
               break;
          case 'n':
               config.totalsites = atoi(optarg);
               break;
          case 'p':
               config.priority = atoi(optarg);
               break;
          case 'R':
               tmppeer = true; // FALLTHROUGH
          case 'r':
               tmphost = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               tmpport = (unsigned short)atoi(portstr);
               config.addOtherHost(tmphost, tmpport, tmppeer);
               break;
          case 'v':
               config.verbose = true;
               break;
          case 'w':
               config.no_dummy_wr = true;
               //config.priority = 2;
               break;
          case '?':
          default:
               usage();
     // Error check command line.
     if ((!config.got_listen_address) || config.home == NULL)
          usage();
     RepQuoteExample runner;
     g_runner=&runner;
     g_config=&config;
     try {
          runner.init(&config);
          runner.doloop();
     } catch (DbException dbe) {
          cerr << "Caught an exception during initialization or"
               << " processing: " << dbe.what() << endl;
     runner.terminate();
     return 0;
// This is a very simple thread that performs checkpoints at a fixed
// time interval. For a master site, the time interval is one minute
// plus the duration of the checkpoint_delay timeout (30 seconds by
// default.) For a client site, the time interval is one minute.
void checkpoint_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     int i, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Perform a checkpoint.
          // original line
          if ((ret = env->txn_checkpoint(0, 0, 0)) != 0) {
          //if ((ret = env->txn_checkpoint(0, 0, DB_FORCE)) != 0) {
               env->err(ret, "Could not perform checkpoint.\n");
               return ((void *)EXIT_FAILURE);
// This is a simple log archive thread. Once per minute, it removes all but
// the most recent 3 logs that are safe to remove according to a call to
// DBENV->log_archive().
// Log cleanup is needed to conserve disk space, but aggressive log cleanup
// can cause more frequent client initializations if a client lags too far
// behind the current master. This can happen in the event of a slow client,
// a network partition, or a new master that has not kept as many logs as the
// previous master.
// The approach in this routine balances the need to mitigate against a
// lagging client by keeping a few more of the most recent unneeded logs
// with the need to conserve disk space by regularly cleaning up log files.
// Use of automatic log removal (DBENV->log_set_config() DB_LOG_AUTO_REMOVE
// flag) is not recommended for replication due to the risk of frequent
// client initializations.
void log_archive_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Get the list of unneeded log files.
          if ((ret = env->log_archive(&list, DB_ARCH_ABS)) != 0) {
               env->err(ret, "Could not get log archive list.");
               return ((void *)EXIT_FAILURE);
          if (list != NULL) {
               listlen = 0;
               // Get the number of logs in the list.
               for (begin = list; *begin != NULL; begin++, listlen++);
               // Remove all but the logs_to_keep most recent
               // unneeded log files.
               minlog = listlen - logs_to_keep;
               for (begin = list, i= 0; i < minlog; list++, i++) {
                    if ((ret = unlink(*list)) != 0) {
                         env->err(ret,
                         "logclean: remove %s", *list);
                         env->errx(
                         "logclean: Error remove %s", *list);
                         free(begin);
                         return ((void *)EXIT_FAILURE);
               free(begin);
#define DATABASE_DUMMY "dummy.db"
void create_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: DB->open");
void reopen_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "reopen_dummy_db: DB->open");
void perform_db_operation(DB_ENV env, DB *dbp, bool bRead)
//main loop
//DB *dbp=NULL;
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
DBT key, data;
char buf[20]="dummy", *rbuf;
rbuf=buf;
if (*dbp == NULL)
create_dummy_db(dbenv, dbp);
if (! bRead)
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     key.data = buf;
     key.size = (u_int32_t)strlen(buf);
     data.data = rbuf;
     data.size = (u_int32_t)strlen(rbuf);
     if ((ret = (*dbp)->put(*dbp, NULL, &key, &data, 0)) != 0)
          if (ret == DB_REP_HANDLE_DEAD)
               //create_dummy_db(dbenv, dbp);
               reopen_dummy_db(dbenv, dbp);
               (*dbp)->err(*dbp, ret, "DB->put :");
          else
          if (ret != DB_KEYEXIST)
               (*dbp)->err(*dbp, ret, "perform_db_operation: DB->put");
     else
          DB_BTREE_STAT *statp;
          (*dbp)->stat(*dbp,NULL, &statp, 0);
          std::cout<<"dbp read stats: key#"<< statp->bt_nkeys <<std::endl;
void dummy_write_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     DB *m_dbp; // a pointer
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          if (! app->no_dummy_wr)
               if (app->is_master)
               perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
          usleep(1 * 1000 * 1000);
          else
               if (app->is_master)
                    //DB *db_quote=g_repquote->get_DB();
                    //perform_db_operation(env->get_DB_ENV(),&db_quote,true);
                    //if (g_repquote)
                    //     g_runner->print_stocks_size(g_repquote);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
                    //perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    env->rep_flush();
          usleep(4 * 1000 * 1000);
my script to simulate the split brain
#!/bin/sh
[ -z "$node1" ] && node1=10.10.32.121
[ -z "$node2" ] && node2=10.10.32.91
trap myend 0 1 2 3 6 9 14 15
myend()
     echo "Receive signal to stop test..."
     un_split_brain
     echo "done"
     exit 1
split_brain()
     echo -n "Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 2 >/dev/null 2>&1
     echo "done"
un_split_brain()
     echo -n "Undo Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 1 >/dev/null 2>&1
     echo "done"
is_slave()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c CLIENT)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
is_master()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c MASTER)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
wait_for_master()
     echo -n "Waiting for MASTER at node $node ... "
     is_master $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
     usleep 500000
     is_master $node
     r=$?
     echo -n "."
     done
     echo "done"
wait_for_slave()
     local r
     local tm
     tm=0
     echo -n "Waiting for SLAVE at node $node ... "
     is_slave $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
          usleep 500000
          is_slave $node
          r=$?
          echo -n "."
          tm=$((tm+1))
          [ $tm -gt 120 ] && break
     done
     [ $tm -gt 120 ] && ret=0 || ret=1
     echo "done"
     return $ret
run_test_split_brain()
     local nt
     nt=1
     nfails=0
     x=4
     [ -z "$1" ] && node=$node2
     while ((1))
     do
          printf "*************** TEST [%02d] ********************\n" $nt
          split_brain
          wait_for_master
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
          echo "done"
          un_split_brain
          wait_for_slave
          r=$?
          [ ! $r -eq 1 ] && echo "`date` - test [$nt] - fails ..." || echo "`date` - test [$nt] - OK ."
          [ ! $r -eq 1 ] && nfails=$((nfails+1))
          perc_failure=$(echo "100.0 - $nfails / $nt * 100.0" | bc -l)
          echo "************************************************ [% Success test $perc_failure % ]"
          nt=$((nt+1))
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
     done
run_test_split_brain
here is the makefile to run to two environments
i run:
- make run
and in another window sh test_split_brain.sh
node1?=10.10.32.121
node2?=10.10.32.91
nsite?=2
debug?=0
all: RepQuoteExampleEric install
RepConfigInfo.o: RepConfigInfo.cpp RepConfigInfo.h
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 -c RepConfigInfo.cpp -o RepConfigInfo.o
RepQuoteExampleEric: RepQuoteExampleEric.cpp RepConfigInfo.o
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 RepQuoteExampleEric.cpp RepConfigInfo.o -o RepQuoteExampleEric -L /usr/local/BerkeleyDB.5.1/lib/ -lreadline -lcurses -ldb_cxx
kill:
     -ssh -X root@$(node1) "killall -9 /root/RepQuoteExampleEric"
     -ssh -X root@$(node2) "killall -9 /root/RepQuoteExampleEric"
run: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v | tee /tmp/BDB.log\"" &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
run_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
debug_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
     sleep 3
     ssh -X root@$(node2) /sbin/pidof RepQuoteExampleEric >/tmp/pid
     ssh -X root@$(node2) ~/kdbg /root/db-5.1.19/examples/cxx/excxx_repquote/RepQuoteExampleEric -p `cat /tmp/pid`
run_debug_node1: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v\"" &
run_debug_node2: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\"" &
install: RepQuoteExampleEric
     scp RepQuoteExampleEric root@$(node1):~
     scp RepQuoteExampleEric root@$(node2):~
clean_env: clean_env1 clean_env2
clean_env1:
     ssh -X root@$(node1) rm -rf /opt/bdb/*
clean_env2:
     ssh -X root@$(node2) rm -rf /opt/bdb/*

HDV tape random capture aborts in FCP

I am trying to capture many hours of HDV footage to disc. Capture Now is aborting frequently either due to alleged timecode breaks or alleged stream problems. But in fact the breaks seem to occur at random: on second or third attempts, capture passes smoothly over the supposed break point, only to abort again later on. I am rarely able to capture more than a minute at a time. Here are the hard facts:
Tapes are Maxell Professional DV-M63Master ME DV/HDV, recorded on only once
Camera for both shooting and playback is Sony HVR-V1P, connnected via FireWire to:
iMac G4 1.25 GHZ, 1 GB RAM, running OS X 10.5.8 and FCP 6.0.6
Capture disc is G-Tech 1TB G-drive, connected via FireWire (just bought to replace LaCie drive in the hope that it would solve the problem)
FCP is set to abort capture on dropped frames and on timecode break (I understand that this second option makes no difference with HDV anyway). I don't want to miss frames that are actually on the tape.
I know lots of people have had similar problems, and I've tried all the suggestions I have found on the forums. Have I missed something?

Julian,
With all due respect, you're asking quite a bit from a single G4 system. If you were back on Tiger and running FCP 5.1.4 you'd have maybe 25% more power, but you're real close to the edge. (I still have a dual 1.25 GHz G4 PowerMac and experimented with FCS2 on it...)
Attaching a fast disk array (RAID) might help a bit, but I doubt the central problem is disk speed. In reality, I think you only need to maintain something like 5MB/sec data transfer rate for HDV, but there is overhead processing and I'm guessing that a system that age probably has some "baggage."
I'd suggest that you freshen up the system by exporting non-critical data from the system drive until you have about 20-30% free space, repair permissions, run your crons (a utility like MacJanitor or Cocktail will do this) and run Disk Warrior on all attached drives. The point is to clean out the system to provide max efficiency at the basic level. You may need to disable apps that like to run in background and maybe even take the system off the net while working. If you can re-gain the original performance, you will probably make it through...
Good luck!

Abort rate at Send2VRU node

Similar Messages

Maybe you are looking for