Wednesday, July 29, 2009

Microsoft Office Communicator 2007 R2 and DSCP QoS

Anyone trying to get DSCP working with Office Communicator 2007 R2 on Windows XP needs to grab the 'July' update (v3.5.6907.37) otherwise you will be having problems getting video to work. Here are the links:

"Description of the update for Communicator 2007 R2: July 2009"
http://support.microsoft.com/kb/969695/

"Video frames are not displayed in Office Communicator 2007 R2 on a Windows XP-based computer"
http://support.microsoft.com/kb/971846/

(BTW, I'd like to thank Michael Melling for spotting those links, and for helping with the testing that I describe below.)

Before the update, with Office Communicator versions v3.5.6907.0 or v3.5.6907.34, on Windows XP, after setting QoSEnabled to 1, you find audio continues to work fine, but video fails. Most of the time we just got a black video windows, sometimes it was a grey window, and sometimes a frozen image.

Using OCS's "Monitoring Server Reports" you can see whats going wrong. Click the "User Call List" link, and then filter by one of the users in the call, and this will give you a detailed statistical report of all finished calls. For the Video Stream you will see a "Packet loss Rate" of about 60% !

Office Communicator, with QoSEnabled, works fine on Vista and Windows 7, without this 'July' update. I think that Microsoft must have done the majority of their testing with their latest operating systems, and have only recently got around to testing with Windows XP.

We have been battling this problem for a couple of weeks, so this fix comes just in time! In trying to investigate what was goingwrong, we have learned a great deal, so not all the time was wasted, just some!

DSCP QoS is enabled in the registry, like this:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\RTC\Transport
"QoSEnabled"=dword:00000001

.. as per these instructions:

"Enabling DSCP Marking"
http://technet.microsoft.com/en-us/library/dd441192(office.13).aspx

BTW, you need to exit (not just close) Office Communicator and restart it, for this registry setting to take effect.

Once Office Communicator has setup the call using SIP (over TCP), using Office Communications Server 2007 R2, all audio & video network traffic is sent directly between the two clients using RTP (Real Time Protocol) packets in UDP packets.

Using WireShark, we could see the packets being tagged correctly:

Audio - DSCP 0x28: Class Selector CS5
Video - DSCP 0x18: Class Selector CS3

BTW, these are the default values. You can change them using a group policy.

In WireShark, to just grab the audio & video traffic, use a capture filter like this:

udp and ip[1]!=0

This will match any UDP packets that have a non-zero DSCP field in the IP header.You can use the following display filters to select either Audio or Video packets:

For Audio, use:
ip.dsfield.dscp == 40

For Video, use:
ip.dsfield.dscp == 24

BTW, if you use tcpdump to capture the packets, remember that the DSCP values are multiplied by four. So use these capture filters with tcpdump:

For Audio:
ip[1]=0xA0
ip[1]=160

For Video:
ip[1]=0x60
ip[1]=96

It is very useful to be able to see the RTP headers in the packets. To do this in WireShark, right-click on any of the packets of a UDP stream, and select 'Decode As..', the select 'Transport' as RTP.

Once you have WireShark decoding the RTP header, you can see a couple of interesting fields in the header. The first is 'Payload type',and this is set as follows:

Type 97 - RT-Audio encoded redundant data
Type 111 - 'Siren' audio - used by 'LiveMeeting'
Type 114 - RTaudio (x-msrta)
Type 118 - 'Comfort Noise'
Type 121 - RTvideo (x-rtvc1)

The second interesting RTP header field is the 'Sequence number'. This is a 16-bit counter field, that increments by 1 for each packet sent in a particular stream, and is useful for detecting dropped or duplicated packets. (The sequence number seems to start at a random value.)

There is a useful feature in WireShark which will do 'Stream Anaysis..'. On WireShark v1.2.0 you will find this under the 'Telephony > RTP' menu. This will highlight where the sequence number goes wrong.

Ok, so why was the video failing, (before the 'July' update for Office Communication). We had two laptops, running Office Communicator, connected together via a HP Procurve switch,and enabled port mirroring on the switch, to a third laptop to see all network traffic. Running WireShark on the client sending video, we saw a lot more video packet, then were being received at the other end, on the receiving client. The third monitoring laptop confirmed that packets were being dropped before they emerged 'onto the wire' from the sending laptop. Looking closely at the WireShark capture from the laptop sending video, and using the 'Stream Analysis..' we saw many video packets with duplicate RTP sequence numbers! Correlating with the capture on the receiving client, we saw that any packet that had been duplicated was missing. It was not a case of just the duplicated packet being missing, but also the original packet. Thus from the point of view of the receiving packet, there was a massive level of packet loss, hence it could not display the video stream.

BTW, on Windows XP, you can use 'tcmon.exe' from the resource kit to see what QoS policies are in effect, and this confirms packet loss onthe video stream. Here are some screen shots:




Unfortunately tcmon does not run on Vista or Windows 7, which is very disapointing. Let's hope that Microsoft get around to fixing this!

BTW, if you are using Linux to priorities the audio & video traffic as it is sent into the WAN, like we are, you will find the following iptables & 'tc filters' useful to match the packets:

# Audio:
iptables -t mangle -A POSTROUTING -o eth0 -p udp -m dscp --dscp 40 -j mark1
iptables -t mangle -A POSTROUTING -o eth0 -p udp -m dscp --dscp-class CS5 -j mark1
tc filter add dev eth0 parent 10:0 prio 1 protocol ip u32 match ip tos 0xA0 0xff flowid 10:200

# Video:
iptables -t mangle -A POSTROUTING -o eth0 -p udp -m dscp --dscp 24 -j mark2
iptables -t mangle -A POSTROUTING -o eth0 -p udp -m dscp --dscp-class CS3 -j mark2
tc filter add dev eth0 parent 10:0 prio 1 protocol ip u32 match ip tos 0x60 0xff flowid 10:210

Best Regards
Nigel Smith

OpenSolaris 2009-06 and the Comstar iScsi Target

Having upgraded from OpenSolaris 2008-11 to 2009-06, I wanted to try the new Comstar iscsi target, to see how well it worked, and to see if it was ready to replace the 'old' iscsi target. And I wanted to see how far I could get with the package versions in the 'production-quality' release repository, based on snv_111. To start with, I just wanted to get something basic going as quickly as possible.

The first tasks was to disable the old iscsi target (to avoid a conflict with the new target which would also want to listen on TCP port 3260), and to enable the 'SCSI Target Mode Framework' - stmf.

# svcs iscsitgt
online Jul_09 svc:/system/iscsitgt:default
# svcs stmf
disabled Jul_09 svc:/system/stmf:default
# svcadm disable iscsitgt
# svcs iscsitgt
STATE STIME FMRI
disabled 21:29:57 svc:/system/iscsitgt:default
# svcadm enable stmf
# svcs stmf
STATE STIME FMRI
online 21:30:55 svc:/system/stmf:default
# stmfadm list-state
Operational Status: online
Config Status : initialized

Next I created a new small zfs volume.
(My zfs pool is named 'rz2pool' - it's Raidz2 - dual parity.)

# zfs create -V 4g rz2pool/iscsi_lun1

And then created a 'logical unit', using the zvol as it's backing store.

# sbdadm create-lu /dev/zvol/rdsk/rz2pool/iscsi_lun1

Created the following LU:

GUID DATA SIZE SOURCE
-------------------------------- ------------------- ----------------
600144f00008278f04694a5cf0980001 4294901760 /dev/zvol/rdsk/rz2pool/iscsi_lun1

# stmfadm list-lu -v
LU Name: 600144F00008278F04694A5CF0980001
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/rz2pool/iscsi_lun1
View Entry Count : 0

Now using the GUID of the logical unit, we make it visible to any initiators, by adding a 'view'

# stmfadm add-view 600144f00008278f04694a5cf0980001
# stmfadm list-view -l 600144f00008278f04694a5cf0980001
View Entry: 0
Host group : All
Target group : All
LUN : 0

Next we need to do the iscsi part, for which we need to install the iscsi target and the 'itadm' command. These were not already installed so I had to add the 'SUNWiscsit' package.

# pkg install -v SUNWiscsit
Creating Plan | Before evaluation:
UNEVALUATED:
+pkg:/SUNWiscsit@0.5.11,5.11-0.111:20090508T161047Z

After evaluation:
None -> pkg:/SUNWiscsit@0.5.11,5.11-0.111:20090508T161047Z
None -> pkg:/SUNWiscsidm@0.5.11,5.11-0.111:20090508T161041Z
Actuators:
restart_fmri: svc:/system/manifest-import:default
None
DOWNLOAD PKGS FILES XFER (MB)
Completed 2/2 23/23 0.66/0.66

PHASE ACTIONS
Install Phase 76/76
PHASE ITEMS
Reading Existing Index 8/8
Indexing Packages 2/2

# pkg list -v SUNWiscsit
FMRI STATE UFIX
pkg:/SUNWiscsit@0.5.11,5.11-0.111:20090508T161047Z installed ----
# pkg list -v SUNWiscsidm
FMRI STATE UFIX
pkg:/SUNWiscsidm@0.5.11,5.11-0.111:20090508T161041Z installed ----

On the next step I had a small problem.
(BTW, I am not using NWAM to configure my network card.)

# svcadm enable -r iscsi/target:default
svcadm: svc:/milestone/network depends on svc:/network/physical, which has multiple instances.

# svcs -a | grep network/physical
disabled Jul_09 svc:/network/physical:nwam
online Jul_09 svc:/network/physical:default

# svcadm enable iscsi/target
# svcs -a | grep iscsi
disabled Jul_09 svc:/network/iscsi_initiator:default
disabled 21:29:57 svc:/system/iscsitgt:default
maintenance 22:29:13 svc:/network/iscsi/target:default

# svcs -xv
svc:/network/iscsi/target:default (iscsi target)
State: maintenance since Tue Jul 14 22:29:13 2009
Reason: Start method failed repeatedly, last exited with status 4.
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 1M itadm
See: /var/svc/log/network-iscsi-target:default.log
Impact: This service is not running.

# tail /var/svc/log/network-iscsi-target\:default.log
[ Jul 14 22:29:13 Enabled. ]
[ Jul 14 22:29:13 Executing start method ("/lib/svc/method/iscsi-target start"). ]
iscsi-target: Requesting to enable iscsi target
open failed: INVALIDUnable to open device /devices/pseudo/iscsit@0:iscsit[ Jul 14 22:29:13 Method "start" exited with status 4. ]

Rebooting solved this problem.

# svcs -a | grep iscsi
disabled 19:25:15 svc:/network/iscsi_initiator:default
disabled 19:25:18 svc:/system/iscsitgt:default
online 19:25:37 svc:/network/iscsi/target:default

So now we create the iscsi target and check it looks ok:

# itadm create-target
Target iqn.1986-03.com.sun:02:7e969860-cb8d-475a-ddbb-97bf2f22ce7b successfully created

# itadm list-target -v
TARGET NAME STATE SESSIONS
iqn.1986-03.com.sun:02:7e969860-cb8d-475a-ddbb-97bf2f22ce7b online 0
alias: -
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: default

I was then able to discover and logon to this iscsi target from a Windows 2003 server, using the Microsoft iscsi initiator.

The above is just the simplest possible configuration, just to allow some simple tests. For a more practical setup, you may well want multiple logical units, and want to put some restrictions on the views. But details on how to do that will have to wait for a follow-up post to this blog!

I did notice a couple of things fairly quickly. I saw the connection between the initiator and target was dropping every 50 seconds, followed immediately by a re-logon. Checking with WireShark showed that it was the target starting this with a [FIN, ACK]. With the old iscsi target I would have used DTrace to confirm this, but the iscsi target provider for the new iscsi target was not available.

I looked at the source code for the comstar iscsi target:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/comstar/port/iscsit/iscsit.c

.. and from the history link, I could see that a couple of likely fixes had already been committed:

26-May-2009 Priya Krishnan - 6809997 COMSTAR iscsi target DTrace Provider needed
08-May-2009 Peter Dunlap - 6755803 win2003 initiator numerous iscsi connection lost and connection retries mesgs to iscsi target

Now normally, I would hyper-link those bug number for you. But in this case there is no point, because Sun has chosen not to allows details of those bugs to be publicly available. (Or any of the other comstar iscsi target bugs AFAIK.)

Having those fixes is important for me, so it looks like I will have to update my OpenSolaris 2009-06 using the development repository to snv_118.