I checked our Smoothwall box, and found that the L2TP daemon (l2tpd) was not running. But after restarting it, it crashed again as soon as the user tried to connect. Other users could connect without problem.
Here is the tail end of the log from /var/log/messages:
l2tpd version 0.70-smoothwall started on smoothwallAF1 PID:28094
Linux version 2.4.34.5-up on a i686, listening on IP address 0.0.0.0, port 1701
<--snip-->
handle_avps: handling avp's for tunnel 61598, call 0
message_type_avp: message type 10 (Incoming-Call-Request)
message_type_avp: new incoming call
ourcid = 13433, entropy_buf = 3479
assigned_session_avp: assigned session id: 1
call_serno_avp: serial number is 0
bearer_type_avp: peer bears: analog
handle_avps: Bad exit status handling attribute 1 (Result Code).
Segmentation fault (core dumped) /modules/tunnel/usr/sbin/l2tpd
So, why suddenly a problem?
The Windows XP users could connect ok. The Windows Vista users could connect ok.
Ahh! The user who was having problems had recently applied Vista SP1.
Using tcpdump, I captured the packets arriving at Smoothwall on UDP port 1701, and then examined the capture file using WireShark. The problem seems to occur on the ICRQ packet, which is the 'Incomming Call Request', transmitted by the Windows client to the Smoothwall server.
For Windows XP clients, we see the following AVP's (Attribute Value pair):
'Control Message', 'Assigned Session', 'Call Serial Number' and 'Bearer Type'.
For the Windows Vista SP1 client, there was an extra AVP tagged onto the end. This is a 'Vendor-Specific' AVP of 'Type 1', specifying a 'Vendor ID' of 311 (0x0137), meaning 'Microsoft'.
So what is this extra Microsoft AVP? A quick google finds a Cisco document, implying it may be a related to RADIUS. It talks about Vendor-Specific Attributes (VSAs). Table 36 lists Vendor-Specific RADIUS IETF Attributes with Vendor company code 311 and Sub-type 1 which is a "MSCHAP-Response" attribute.
Ok, so it seems the version of l2tpd we are using does not like the presence of this extra AVP, and mistakes it for a 'Result Code' AVP, which should only be present in CDN and StopCCN messages.
The relavent RFC is: RFC2261 - "Layer Two Tunneling Protocol".
Looking at the top of page 50 of the RFC, for an ICRQ, it lists the AVPs that MUST be present, and the AVPs that MAY be present. This would seem to indicate that a vendor specific AVP is NOT valid at this stage.
Looking at page 12 of the RFC, section 4.1, which details the format of the AVP, implies that the problem could be fixed if l2tpd were to look at the 'Vendor ID' field, and ignore the AVP in the ICRQ where the Vendor ID was non zero.
Looking at the code for l2tpd, in the file "avp.c" and in function "handle_avps()" there is no code that checks to confirm that (avp->vendorid) is zero.
At the moment we do not have a fix, other than to un-install Vista SP1.