?

Log in

No account? Create an account

Previous Web | Next Web

Replication fixed :-)

Well, I think I've sorted the replication problems... :-)



so, I figured I'd best check that the new DC was getting replicated data correctly... nice little tool in the windows 2000 support tools called "active directory replication monitor" makes this dead easy. Everything looked ok between the temp DC and the new DC, but by default, the tool wasn't showing the USN numbers, so I turned on the "Show Transitive Partners and Extended Data" option... and noticed there' also a "Show Retired Replication Partners" option in there too... so I turned that on aswell... The USNs let you see what revision of the AD data is on each replication partner, means you can double-check the tool's assertion that the replication is up to date... the retired partners option brings up 5 deleted servers (name **DELETED SERVER #n, where n is a number)... this got me thinking... about two things... firstly, where the system had managed to come up with that many ex-servers (I can only think of 4 servers I've had installed in this AD, not 5), but anyway, it doesn't really matter... and secondly, where the replication errors due to having orphaned bits of server info lying around after server demotions?

the second one is really the key to this... anyway, i had a nosey into the event logs for directory services, and all the events are essentially the same:
Event Type:	Error
Event Source:	NTDS Replication
Event Category:	DS RPC Client 
Event ID:	1411
Date:		26.10.2007
Time:		14:48:34
User:		NT AUTHORITY\ANONYMOUS LOGON
Computer:	TEMPORARY
Description:
Active Directory failed to construct a mutual authentication service 
principal name (SPN) for the following domain controller. 
 
Domain controller:
e0362b05-fe28-4edd-b5da-80238b5fe17f._msdcs.DOMAIN.DOM.AIN 
 
The call was denied. Communication with this domain controller might 
be affected. 
 
Additional Data 
Error value:
8589 The DS cannot derive a service principal name (SPN) with which 
to mutually authenticate the target server because the corresponding 
server object in the local DS database has no serverReference 
attribute.

For more information, see Help and Support Center at 
http://go.microsoft.com/fwlink/events.asp.

I went through about 20 of these, and there were only ever two different DCs identified. Doing dig _msdcs.DOMAIN.DOM.AIN axfr | grep CNAME on my Linux box showed that the two Unique IDs from the event logs were not the same as the ones assigned to my two current DCs... so I was fairly sure I was right and this was a problem with some stuff having been orphaned during the removal of old DCs

Next stop was MS's website, where a search for "NTDS Replication event id 1411" (from the event log entried), yielded a possible fix.

The procedure from MS is KB938704, but its over-kill for my particular problem, because it deals with three situations that can cause this 1411 event; an AD having been removed from a DC, a remote DC having been orphaned and a remote DC having its service principle names (SPNs) missing on its computer object

So, it was really just a case of filtering out the unnecessary bits...

In the end, the fix was as simple as:
1) regedit</code> to create a new DWORD value "RepsTo Failure Time (sec) in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters and setting it to 60
2) running repadmin /kcc from a command prompt to force something called the Knowledge Consistency Checker (KCC) to remove stale replication stuff
3) Using AD Sites and Services to forcing each DC to "Check Replication Topology" (in Servers, right click each DC's NTDS Settings and pick it from the All Tasks cascade-menu)

MS then go on and add new replication links for any DCs that are missing, but none of mine were, so that was that.

Well, according to MS's procedure anyway... I figured I'd take the registry setting back out again so that everything was as it was when this all started... MS don't do that for some reason...

Anyway, I applied that process at about 3pm, and its now 45 minutes later... I was getting log entries every 2 to 10 minutes about the NTDS Replication glitches... I haven't had a single 1411 entry in that time.

I did get a 1313 event, but it looks like it was from when I created the registry key... because new DWORDS default to 0, and thats not a valid value for this settings, KCC tells you it'll use 1 instead...

And I also got some 1104 KCC events telling me that KCC had "sucessfully terminated the following change notifications." and giving the IDs that were causing the original 1411s :-)

So, all looks get in DS Replication land!