Had an issue that’s affected multiple deployments that I’ve been working on. At one of my clients in particular, this reached an escalation level. Just sharing this to assist others, as we went through 4 TAC Engineers to get a root cause and eventually this was attributed to a hidden Cisco defect in 10.x and above… The first engineer told us that we needed to rebuild an 18 node CUCM cluster from scratch to fix this!
Just sharing this to assist others, as we went through 4 TAC Engineers to get a root cause and eventually this was attributed to a hidden Cisco defect in 10.x and above… The first engineer told us that we needed to rebuild an 18 node CUCM cluster from scratch to fix this! Don’t always believe TAC!
Problem Description
- Client complains that auto-registered phones are failing to register
- Eventually determined that the phone had been previously registered and what subsequently deleted
- Certain clients say that if they manually configure the phone, it registers
- Phone logs show a TFTP config error
Went on site and noted that there were persistent DB-related syslog alarms being generated.
The SDL logs showed the registration failure and a syslog event was logged:
00024541.012 |07:28:21.243 |AppInfo |SIPStationD(269) - checkDNsReceived: TotalCount=0 00024541.013 |07:28:21.243 |AppInfo |SIPStationD(269) - sendLineRegisterReq: DN match failed for AUTO-REG 00024541.014 |07:28:21.243 |AppInfo |SIPStationD(269) - sendLineRegisterReq: Phone is autoregistering...sending 200OK followed by a restart request 00024541.015 |07:28:21.243 |AppInfo |SIPStationD(269) - buildServiceControl: Sending callid from mPrimaryLineNum 0: 0UTqUn-9I4FebBfs1swQbo.GQ78HIxyf 00024541.016 |07:28:21.243 |AppInfo |SIPStationD(269) - sendServiceControlNotify: starting servicecontrol NOTIFY timer 00024541.017 |07:28:21.243 |AppInfo |SIPStationD(269) - sendServiceControlNotify: sending service-control NOTIFY 00024541.018 |07:28:21.243 |AppInfo |EndPointTransientConnection - An endpoint attempted to register but did not complete registration Connecting Port:5060 Device name:SEPAAAABBBBCCCC Device IP address:192.168.100.123 Device type:592 Reason Code:7 Protocol:SIP Device MAC address:AAAABBBBCCCC IPAddressAttributes:0 LastSignalReceived:SIPRegisterInd StationState:auto_register App ID:Cisco CallManager Cluster ID:MYCUCM Node ID:mycucmhostame 00024541.019 |07:28:21.243 |AlarmErr |AlarmClass: CallManager, AlarmName: EndPointTransientConnection, AlarmSeverity: Error, AlarmMessage: , AlarmDescription: An endpoint attempted to register but did not complete registration, AlarmParameters: ConnectingPort:5060, DeviceName:SEPAAAABBBBCCCC, IPAddress:192.168.100.123, DeviceType:592, Reason:7, Protocol:SIP, MACAddress:AAAABBBBCCCC, IPAddrAttributes:0, LastSignalReceived:SIPRegisterInd, StationState:auto_register, AppID:Cisco CallManager, ClusterID:MYCUCM, NodeID:mycucmhostame, 00024542.000 |07:28:21.243 |AppInfo |-->RISCMAccess::DeviceTransientConnection(...) 00024543.000 |07:28:21.243 |AppInfo |Device Transient deviceName : SEPAAAABBBBCCCC, IPAddress : 192.168.100.123, IPv6Address : not shown, IPv4Attribute :0, IPv6Attribute :0, Protocol : 2 00024544.000 |07:28:21.243 |AppInfo |DebugMsg deviceName : SEPAAAABBBBCCCC, DeviceType : 592, risClass: 1 00024545.000 |07:28:21.243 |AppInfo |<--RISCMAccess::DeviceTransientConnection(...) 00024541.020 |07:28:21.243 |AppInfo |SIPStationD(269) - Received register from Cisco phone for unconfigured line AUTO-REG; shutting down 00024541.021 |07:28:21.243 |AppInfo |StationTracking::updateMismatchCount: deviceId=SEPAAAABBBBCCCC, mismatchCount=174 00024541.022 |07:28:21.243 |AppInfo |SIPStationD(269) - processVersionStampMismatch: VersionStampMismatch detected, count = 174 00024541.023 |07:28:21.243 |AppInfo |SIPStationD(269) - DevStat-StopClose: LineRegisterReq send failure
And later we in the same trace see that CUCM sends a NOTIFY for the above:
NOTIFY sip:AUTO-REG@192.168.100.123:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK2b1379d1903
From: <sip:192.168.1.100>;tag=789090571
To: <sip:AUTO-REG@192.168.100.123>
Call-ID: bb580e00-6df1b475-27b-247410ac@192.168.1.100
CSeq: 101 NOTIFY
Max-Forwards: 70
Date: Wed, 09 Mar 2016 05:28:21 GMT
User-Agent: Cisco-CUCM11.0
Event: service-control
Subscription-State: active
Contact: <sip:192.168.1.100:5060>
Content-Type: text/plain
Content-Length: 67action=restart
RegisterCallId={0UTqUn-9I4FebBfs1swQbo.GQ78HIxyf}
I traced through the logs and saw that soon after the NOTIFY was placed on the wire, the following was logged in SDL that highlighted the problem:
00024556.000 |07:28:21.385 |SdlSig |StationClose |wait |SIPHandler(1,100,80,1) |SIPStationInit(1,100,73,1) |1,100,10,1.591^192.168.100.123^* |[T:N-H:0,N:0,L:0,V:0,Z:0,D:0] CloseStationReason = 0 StationId = SEPAAAABBBBCCCC 00024546.004 |07:28:21.428 |AppInfo |DB: odbc::SQLException(Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UNIQUE INDEX column (Unique Index:assign).) 00024546.005 |07:28:21.429 |AppInfo |DBLException - An error occurred while performing database activities ErrorCode:-239 ExceptionString:Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UN App ID:Cisco CallManager Cluster ID:MYCUCM Node ID:mycucmhostame 00024546.006 |07:28:21.429 |AlarmErr |AlarmClass: CallManager, AlarmName: DBLException, AlarmSeverity: Alert, AlarmMessage: , AlarmDescription: An error occurred while performing database activities, AlarmParameters: ErrorCode:-239, ExceptionString:Error fetching data from datasource: [Informix][Informix ODBC Driver][Informix]Could not insert new row - duplicate value in a UN, AppID:Cisco CallManager, ClusterID:MYCUCM, NodeID:mycucmhostame, 00024546.007 |07:28:21.429 |AppInfo |dBProcs - ERROR odbc::SQLException addDevice() 00024546.008 |07:28:21.429 |AppInfo |ProcessDb - ERROR initializing_DbStationAutoRegisterReq - addDevice failed
So my initial thought was a bug with orphaned data. Had to be something along this line as the SDL error was referencing a duplicate. So , I relunctantly decided to go straight to TAC for analysis. I sent this off to TAC and got crickets… The assigned engineer eventually told me this was an NTP issue which didn’t make sense at all.
Another colleague logged a similar case that we had at another client. 2 re-queues later we got a great engineer who identified CSCug20588 as the cause of our problem. It’s a hidden TAC case.
The fix? Well, a RIDICULOUS bug!
When auto-registering a phone in 10.x the Auto-Reg config uses a UDT/ULT. This results in a phone-specific Phone Button Template being created when a phone is inserted into the Informix database, of the form:
SEP<MAC-ADDRESS>-SIP-Individual Template
The damn PBT was causing the phone not to register! This obvoiusly remained in the list of PBT’s when the phone was deleted from CUCM. When it came to AUTO-REG the phone, CUCM clearly does not look to delete/re-add the PBT or simply use the existing one. Thus the DB insertion error, and the problematic behaviour.
I deleted the PBT and immediately the phone registered.
There’s no fix for this in 10.x and we’re waiting on the BU to confirm an official 11.x SU that will contain the fix for this. One to watch out for!
Nice – we just hit this as well…. thanks for sharing.
LikeLike
Pleasure man, hope it helped.
LikeLiked by 1 person
This affected us too, thanks for the tip!
LikeLike
It’s a pleasure! 🙂
LikeLike
I just ran into this on CUCM 10.5.2. Here are a couple of SQL querys you can run from the CLI to list and delete any PBTs that start with SEP* and are not assigned to a phone. These worked for me on 10.5.2, but be careful. I had a few hundred that needed to be cleaned out.
To list the unused SEP PBTs (if it’s just a few, manually delete them):
run sql SELECT name FROM phonetemplate WHERE 0 = (SELECT COUNT (*) FROM device WHERE device.fkphonetemplate = phonetemplate.pkid) AND phonetemplate.name like ‘SEP%’
To bulk delete unused SEP PBTs:
run sql DELETE FROM phonetemplate WHERE 0 = (SELECT COUNT (*) FROM device WHERE device.fkphonetemplate = phonetemplate.pkid) AND phonetemplate.name LIKE ‘SEP%’
Thanks, Randy
LikeLiked by 1 person
Awesome share, thanks for this.
LikeLike
Dude, nicely done. Sad to report it’s still happening in 11.5
LikeLike
Sorry to hear its not yet fixed… Glad this helped…
LikeLike
Thanks a lot Randy for those SQL commands! That worked for me!
LikeLiked by 1 person
Just had the same thing on 12.5. 😦
LikeLiked by 1 person
Sorry to hear this man…
LikeLike
Thanks for taking the time to document this issue so precisely, top man!
LikeLiked by 1 person
Its a pleasure Colin
LikeLike