Had an issue that’s affected multiple deployments that I’ve been working on. At one of my clients in particular, this reached an escalation level. Just sharing this to assist others, as we went through 4 TAC Engineers to get a root cause and eventually this was attributed to a hidden Cisco defect in 10.x and above… The first engineer told us that we needed to rebuild an 18 node CUCM cluster from scratch to fix this!
Just sharing this to assist others, as we went through 4 TAC Engineers to get a root cause and eventually this was attributed to a hidden Cisco defect in 10.x and above… The first engineer told us that we needed to rebuild an 18 node CUCM cluster from scratch to fix this! Don’t always believe TAC!
- Client complains that auto-registered phones are failing to register
- Eventually determined that the phone had been previously registered and what subsequently deleted
- Certain clients say that if they manually configure the phone, it registers
- Phone logs show a TFTP config error
Went on site and noted that there were persistent DB-related syslog alarms being generated.
The SDL logs showed the registration failure and a syslog event was logged:
00024541.012 |07:28:21.243 |AppInfo |SIPStationD(269) - checkDNsReceived: TotalCount=0 00024541.013 |07:28:21.243 |AppInfo |SIPStationD(269) - sendLineRegisterReq: DN match failed for AUTO-REG 00024541.014 |07:28:21.243 |AppInfo |SIPStationD(269) - sendLineRegisterReq: Phone is autoregistering...sending 200OK followed by a restart request 00024541.015 |07:28:21.243 |AppInfo |SIPStationD(269) - buildServiceControl: Sending callid from mPrimaryLineNum 0: 0UTqUn-9I4FebBfs1swQbo.GQ78HIxyf 00024541.016 |07:28:21.243 |AppInfo |SIPStationD(269) - sendServiceControlNotify: starting servicecontrol NOTIFY timer 00024541.017 |07:28:21.243 |AppInfo |SIPStationD(269) - sendServiceControlNotify: sending service-control NOTIFY 00024541.018 |07:28:21.243 |AppInfo |EndPointTransientConnection - An endpoint attempted to register but did not complete registration Connecting Port:5060 Device name:SEPAAAABBBBCCCC Device IP address:192.168.100.123 Device type:592 Reason Code:7 Protocol:SIP Device MAC address:AAAABBBBCCCC IPAddressAttributes:0 LastSignalReceived:SIPRegisterInd StationState:auto_register App ID:Cisco CallManager Cluster ID:MYCUCM Node ID:mycucmhostame 00024541.019 |07:28:21.243 |AlarmErr |AlarmClass: CallManager, AlarmName: EndPointTransientConnection, AlarmSeverity: Error, AlarmMessage: , AlarmDescription: An endpoint attempted to register but did not complete registration, AlarmParameters: ConnectingPort:5060, DeviceName:SEPAAAABBBBCCCC, IPAddress:192.168.100.123, DeviceType:592, Reason:7, Protocol:SIP, MACAddress:AAAABBBBCCCC, IPAddrAttributes:0, LastSignalReceived:SIPRegisterInd, StationState:auto_register, AppID:Cisco CallManager, ClusterID:MYCUCM, NodeID:mycucmhostame, 00024542.000 |07:28:21.243 |AppInfo |-->RISCMAccess::DeviceTransientConnection(...) 00024543.000 |07:28:21.243 |AppInfo |Device Transient deviceName : SEPAAAABBBBCCCC, IPAddress : 192.168.100.123, IPv6Address : not shown, IPv4Attribute :0, IPv6Attribute :0, Protocol : 2 00024544.000 |07:28:21.243 |AppInfo |DebugMsg deviceName : SEPAAAABBBBCCCC, DeviceType : 592, risClass: 1 00024545.000 |07:28:21.243 |AppInfo |<--RISCMAccess::DeviceTransientConnection(...) 00024541.020 |07:28:21.243 |AppInfo |SIPStationD(269) - Received register from Cisco phone for unconfigured line AUTO-REG; shutting down 00024541.021 |07:28:21.243 |AppInfo |StationTracking::updateMismatchCount: deviceId=SEPAAAABBBBCCCC, mismatchCount=174 00024541.022 |07:28:21.243 |AppInfo |SIPStationD(269) - processVersionStampMismatch: VersionStampMismatch detected, count = 174 00024541.023 |07:28:21.243 |AppInfo |SIPStationD(269) - DevStat-StopClose: LineRegisterReq send failure
And later we in the same trace see that CUCM sends a NOTIFY for the above:
NOTIFY sip:AUTO-REG@192.168.100.123:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK2b1379d1903
CSeq: 101 NOTIFY
Date: Wed, 09 Mar 2016 05:28:21 GMT
I traced through the logs and saw that soon after the NOTIFY was placed on the wire, the following was logged in SDL that highlighted the problem:
00024556.000 |07:28:21.385 |SdlSig |StationClose |wait |SIPHandler(1,100,80,1) |SIPStationInit(1,100,73,1) |1,100,10,1.591^192.168.100.123^* |[T:N-H:0,N:0,L:0,V:0,Z:0,D:0] CloseStationReason = 0 StationId = SEPAAAABBBBCCCC 00024546.004 |07:28:21.428 |AppInfo |DB: odbc::SQLException(Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UNIQUE INDEX column (Unique Index:assign).) 00024546.005 |07:28:21.429 |AppInfo |DBLException - An error occurred while performing database activities ErrorCode:-239 ExceptionString:Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UN App ID:Cisco CallManager Cluster ID:MYCUCM Node ID:mycucmhostame 00024546.006 |07:28:21.429 |AlarmErr |AlarmClass: CallManager, AlarmName: DBLException, AlarmSeverity: Alert, AlarmMessage: , AlarmDescription: An error occurred while performing database activities, AlarmParameters: ErrorCode:-239, ExceptionString:Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UN, AppID:Cisco CallManager, ClusterID:MYCUCM, NodeID:mycucmhostame, 00024546.007 |07:28:21.429 |AppInfo |dBProcs - ERROR odbc::SQLException addDevice() 00024546.008 |07:28:21.429 |AppInfo |ProcessDb - ERROR initializing_DbStationAutoRegisterReq - addDevice failed
So my initial thought was a bug with orphaned data. Had to be something along this line as the SDL error was referencing a duplicate. So , I relunctantly decided to go straight to TAC for analysis. I sent this off to TAC and got crickets… The assigned engineer eventually told me this was an NTP issue which didn’t make sense at all.
Another colleague logged a similar case that we had at another client. 2 re-queues later we got a great engineer who identified CSCug20588 as the cause of our problem. It’s a hidden TAC case.
The fix? Well, a RIDICULOUS bug!
When auto-registering a phone in 10.x the Auto-Reg config uses a UDT/ULT. This results in a phone-specific Phone Button Template being created when a phone is inserted into the Informix database, of the form:
The damn PBT was causing the phone not to register! This obvoiusly remained in the list of PBT’s when the phone was deleted from CUCM. When it came to AUTO-REG the phone, CUCM clearly does not look to delete/re-add the PBT or simply use the existing one. Thus the DB insertion error, and the problematic behaviour.
I deleted the PBT and immediately the phone registered.
There’s no fix for this in 10.x and we’re waiting on the BU to confirm an official 11.x SU that will contain the fix for this. One to watch out for!