We use Oracle Goldengate (expensive and probably overkill for Oracle->MSSQL, but good at what it does) to replicate data from an Oracle database into a SQL Server. However, I got an alert the other day that replication had stopped, and when I checked the status of replication, all the REPs we had set up were in status “Starting…”, but none we actually doing anything.
Attempting to stop them got the following error:
GGSCI (GGSERVER) 68> stop rep MYREP Sending STOP request to REPLICAT MYREP ... ERROR: opening port for REPLICAT MYREP (TCP/IP error: Connection refused).
Stopping/Starting the manager service or rebooting the PC didn’t help either – they still said “Starting” and were unresponsive. Even stranger, deleting and recreating the REP gave the same result – before I even attempted to start the REP for the first time, it said “Starting”, and an attempt to start it gave me “Process is starting up – try again later”.
The cause was the REP process status file, located in the DIRPCS folder under the Goldengate root – there should be a file for each REP that’s currently running giving details about the status. When a REP stops, this file is deleted. Since all of the current REPs weren’t doing anything (they were all sitting at the end of the previous trail file), they should have been stopped. I deleted the PCR files for the affected REP streams, and then manager reporting “STOPPED” – at that point, I was able to start up each REP without issue.
I’m not sure how they got that way, but once started again, they all worked without issue. I hope this saves you the troubleshooting time of hunting down these files!