Thursday, June 25, 2009

Using SNMP with a Cisco router to troubleshoot & diagnose ADSL problems

We've been having numerous errors and problem with an ADSL line at the company where I work. We're trying to run an Ipsec tunnel across the ADSL line to connect to one of our small branch offices. Every few hours we would have problems, with the tunnel dropping out. Restarting the router would often fix the problem, but the problems never went away for long.

We were using good quality Cisco 1801 routers. The most useful command for debugging is 'show dsl interface atm0' and 'show int atm0'. Below is some typical output, while the ADSL line was faulty:

#show int atm 0
ATM0 is up, line protocol is up
Hardware is MPC ATMSAR (with Alcatel ADSL Module)
MTU 4470 bytes, sub MTU 4470, BW 832 Kbit/sec, DLY 440 usec,
reliability 255/255, txload 41/255, rxload 6/255
Encapsulation ATM, loopback not set
Encapsulation(s): AAL5 AAL2, PVC mode
23 maximum active VCs, 1024 VCs per VP, 1 current VCCs
VC Auto Creation Disabled.
VC idle disconnect time: 300 seconds
Last input never, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 48
Queueing strategy: Per VC Queueing
5 minute input rate 22000 bits/sec, 20 packets/sec
5 minute output rate 137000 bits/sec, 24 packets/sec
322340 packets input, 50047091 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 182 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
345336 packets output, 145468959 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out

#show dsl interface atm0
ATM0
Alcatel 20190 chipset information
ATU-R (DS) ATU-C (US)
Modem Status: Showtime (DMTDSL_SHOWTIME)
DSL Mode: ITU G.992.1 (G.DMT) Annex A
ITU STD NUM: 0x03 0x2
Vendor ID: 'STMI' 'P '
Vendor Specific: 0x0000 0x0000
Vendor Country: 0x0F 0xB5
Chip ID: C196 (0)
DFE BOM: DFE3.0 Annex A (1)
Capacity Used: 100% 85%
Noise Margin: 1.0 dB 11.0 dB
Output Power: 20.0 dBm 12.5 dBm
Attenuation: 34.0 dB 17.0 dB
Defect Status: None None
Last Fail Code: None
Watchdog Counter: 0xA7
Watchdog Resets: 0
Selftest Result: 0x00
Subfunction: 0x00
Interrupts: 4155 (0 spurious)
PHY Access Err: 0
Activations: 3
LED Status: ON
LED On Time: 100
LED Off Time: 100
Init FW: init_AMR-3.0.014_no_bist.bin
Operation FW: AMR-3.0.014.bin
FW Source: embedded
FW Version: 3.0.14

Interleave Fast Interleave Fast
Speed (kbps): 4640 0 832 0
Cells: 716617 0 108647003 0
Reed-Solomon EC: 43244 0 9 9
CRC Errors: 25362 0 22 14
Header Errors: 15617 0 9 7
Total BER: 1610E-8 0E-0
Leakage Average BER: 2734E-8 0E-0
ATU-R (DS) ATU-C (US)
Bitswap: enabled enabled
Bitswap success: 0 0
Bitswap failure: 0 0

You can see from the above that the 'Down Stream' (DS) noise margin is very low at 1dB. We have also seen 0.5dB reported, or even 0db! You can also see the error counts are high. In the output of 'show int atm 0', the figure to monitor is the number of CRC errors - in the output above it's 182. It is also useful to set 'debug atm errors' on the cisco. The in the syslog's you will see messages like:

*Jun 19 10:22:01.894: ATM0: AAL5 rx errors (status = 0C030000)
*Jun 19 10:22:01.974: ATM0: AAL5 rx errors (status = 2C030000)
*Jun 19 10:22:02.758: ATM0: AAL5 rx errors (status = 0C030000)
*Jun 19 10:22:04.058: ATM0: AAL5 rx errors (status = 2C070000)
*Jun 19 10:22:04.922: ATM0: AAL5 rx errors (status = 0C030000)
*Jun 19 10:22:06.174: ATM0: AAL5 rx errors (status = 0C030000)
*Jun 19 10:22:07.822: ATM0: AAL5 rx errors (status = 0C030000)

To try and dig deeper, I used SNMP (from a Linux box on the same LAN), to query the router:

[root@hexdns1 snmp_queries]# snmpwalk -v1 -c public 192.168.6.1 1.3.6.1.2.1.10.94
SNMPv2-SMI::transmission.94.1.1.1.1.1.19 = INTEGER: 2
SNMPv2-SMI::transmission.94.1.1.1.1.2.19 = INTEGER: 3
SNMPv2-SMI::transmission.94.1.1.2.1.2.19 = STRING: "P"
SNMPv2-SMI::transmission.94.1.1.2.1.3.19 = STRING: "0"
SNMPv2-SMI::transmission.94.1.1.2.1.4.19 = Gauge32: 110
SNMPv2-SMI::transmission.94.1.1.2.1.5.19 = Gauge32: 175
SNMPv2-SMI::transmission.94.1.1.2.1.7.19 = Gauge32: 200
SNMPv2-SMI::transmission.94.1.1.2.1.8.19 = Gauge32: 4492000
SNMPv2-SMI::transmission.94.1.1.3.1.1.19 = ""
SNMPv2-SMI::transmission.94.1.1.3.1.2.19 = STRING: "STMI"
SNMPv2-SMI::transmission.94.1.1.3.1.3.19 = STRING: "0"
SNMPv2-SMI::transmission.94.1.1.3.1.4.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.3.1.5.19 = Gauge32: 340
SNMPv2-SMI::transmission.94.1.1.3.1.7.19 = Gauge32: 130
SNMPv2-SMI::transmission.94.1.1.3.1.8.19 = Gauge32: 967000
SNMPv2-SMI::transmission.94.1.1.4.1.2.19 = Gauge32: 4448000
SNMPv2-SMI::transmission.94.1.1.5.1.2.19 = Gauge32: 832000
SNMPv2-SMI::transmission.94.1.1.6.1.1.19 = Counter32: 697
SNMPv2-SMI::transmission.94.1.1.6.1.2.19 = Counter32: 449
SNMPv2-SMI::transmission.94.1.1.6.1.4.19 = Counter32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.5.19 = Counter32: 1820
SNMPv2-SMI::transmission.94.1.1.6.1.6.19 = Counter32: 31
SNMPv2-SMI::transmission.94.1.1.6.1.10.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.11.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.13.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.14.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.15.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.17.19 = Gauge32: 565
SNMPv2-SMI::transmission.94.1.1.6.1.18.19 = Gauge32: 381
SNMPv2-SMI::transmission.94.1.1.6.1.20.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.6.1.21.19 = Gauge32: 1034
SNMPv2-SMI::transmission.94.1.1.6.1.22.19 = Gauge32: 10
SNMPv2-SMI::transmission.94.1.1.7.1.1.19 = Counter32: 45
SNMPv2-SMI::transmission.94.1.1.7.1.2.19 = Counter32: 28
SNMPv2-SMI::transmission.94.1.1.7.1.3.19 = Counter32: 0
SNMPv2-SMI::transmission.94.1.1.7.1.4.19 = Counter32: 15043
SNMPv2-SMI::transmission.94.1.1.7.1.8.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.7.1.9.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.7.1.10.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.7.1.11.19 = Gauge32: 328
SNMPv2-SMI::transmission.94.1.1.7.1.13.19 = Gauge32: 44
SNMPv2-SMI::transmission.94.1.1.7.1.14.19 = Gauge32: 27
SNMPv2-SMI::transmission.94.1.1.7.1.15.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.7.1.16.19 = Gauge32: 3717
SNMPv2-SMI::transmission.94.1.1.10.1.3.19 = Counter32: 1864592813
SNMPv2-SMI::transmission.94.1.1.10.1.4.19 = Counter32: 804238
SNMPv2-SMI::transmission.94.1.1.10.1.10.19 = Gauge32: 2538597
SNMPv2-SMI::transmission.94.1.1.10.1.11.19 = Gauge32: 11493
SNMPv2-SMI::transmission.94.1.1.10.1.15.19 = Gauge32: 804265846
SNMPv2-SMI::transmission.94.1.1.10.1.16.19 = Gauge32: 400641
SNMPv2-SMI::transmission.94.1.1.11.1.3.19 = Counter32: 31565
SNMPv2-SMI::transmission.94.1.1.11.1.4.19 = Counter32: 56466
SNMPv2-SMI::transmission.94.1.1.11.1.10.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.11.1.11.19 = Gauge32: 0
SNMPv2-SMI::transmission.94.1.1.11.1.15.19 = Gauge32: 18799
SNMPv2-SMI::transmission.94.1.1.11.1.16.19 = Gauge32: 44213

If you have the ADSL-LINE-MIB in place, these OID's can be translated:

# snmptranslate 1.3.6.1.2.1.10.94
SNMPv2-SMI::transmission.94
# export MIBS=ALL
# snmptranslate 1.3.6.1.2.1.10.94
ADSL-LINE-MIB::adslMIB

For a guide on how to use Linux SNMP, see my snmp tutorial.

# snmpwalk -v1 -c public 192.168.6.1 1.3.6.1.2.1.10.94
ADSL-LINE-MIB::adslLineCoding.19 = INTEGER: 2
ADSL-LINE-MIB::adslLineType.19 = INTEGER: interleavedOnly(3)
ADSL-LINE-MIB::adslAtucInvVendorID.19 = STRING: P
ADSL-LINE-MIB::adslAtucInvVersionNumber.19 = STRING: 0
ADSL-LINE-MIB::adslAtucCurrSnrMgn.19 = Wrong Type (should be INTEGER): Gauge32: 110
ADSL-LINE-MIB::adslAtucCurrAtn.19 = Gauge32: 170 tenth dB
ADSL-LINE-MIB::adslAtucCurrOutputPwr.19 = Wrong Type (should be INTEGER): Gauge32: 200
ADSL-LINE-MIB::adslAtucCurrAttainableRate.19 = Gauge32: 4686000 bps
ADSL-LINE-MIB::adslAturInvSerialNumber.19 = STRING:
ADSL-LINE-MIB::adslAturInvVendorID.19 = STRING: STMI
ADSL-LINE-MIB::adslAturInvVersionNumber.19 = STRING: 0
ADSL-LINE-MIB::adslAturCurrSnrMgn.19 = Wrong Type (should be INTEGER): Gauge32: 15
ADSL-LINE-MIB::adslAturCurrAtn.19 = Gauge32: 340 tenth dB
ADSL-LINE-MIB::adslAturCurrOutputPwr.19 = Wrong Type (should be INTEGER): Gauge32: 130
ADSL-LINE-MIB::adslAturCurrAttainableRate.19 = Gauge32: 978000 bps
ADSL-LINE-MIB::adslAtucChanCurrTxRate.19 = Gauge32: 4640000 bps
ADSL-LINE-MIB::adslAturChanCurrTxRate.19 = Gauge32: 832000 bps
ADSL-LINE-MIB::adslAtucPerfLofs.19 = Counter32: 453
ADSL-LINE-MIB::adslAtucPerfLoss.19 = Counter32: 13
ADSL-LINE-MIB::adslAtucPerfLprs.19 = Counter32: 0
ADSL-LINE-MIB::adslAtucPerfESs.19 = Counter32: 1997
ADSL-LINE-MIB::adslAtucPerfInits.19 = Counter32: 18
ADSL-LINE-MIB::adslAtucPerfCurr15MinLofs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAtucPerfCurr15MinLoss.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAtucPerfCurr15MinLprs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAtucPerfCurr15MinESs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAtucPerfCurr15MinInits.19 = Gauge32: 0
ADSL-LINE-MIB::adslAtucPerfCurr1DayLofs.19 = Gauge32: 445 seconds
ADSL-LINE-MIB::adslAtucPerfCurr1DayLoss.19 = Gauge32: 14 seconds
ADSL-LINE-MIB::adslAtucPerfCurr1DayLprs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAtucPerfCurr1DayESs.19 = Gauge32: 1425 seconds
ADSL-LINE-MIB::adslAtucPerfCurr1DayInits.19 = Gauge32: 9
ADSL-LINE-MIB::adslAturPerfLofs.19 = Counter32: 22 seconds
ADSL-LINE-MIB::adslAturPerfLoss.19 = Counter32: 15 seconds
ADSL-LINE-MIB::adslAturPerfLprs.19 = Counter32: 0 seconds
ADSL-LINE-MIB::adslAturPerfESs.19 = Counter32: 95908 seconds
ADSL-LINE-MIB::adslAturPerfCurr15MinLofs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAturPerfCurr15MinLoss.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAturPerfCurr15MinLprs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAturPerfCurr15MinESs.19 = Gauge32: 345 seconds
ADSL-LINE-MIB::adslAturPerfCurr1DayLofs.19 = Gauge32: 22 seconds
ADSL-LINE-MIB::adslAturPerfCurr1DayLoss.19 = Gauge32: 15 seconds
ADSL-LINE-MIB::adslAturPerfCurr1DayLprs.19 = Gauge32: 0 seconds
ADSL-LINE-MIB::adslAturPerfCurr1DayESs.19 = Gauge32: 21932 seconds
ADSL-LINE-MIB::adslAtucChanCorrectedBlks.19 = Counter32: 4045523380
ADSL-LINE-MIB::adslAtucChanUncorrectBlks.19 = Counter32: 23129426
ADSL-LINE-MIB::adslAtucChanPerfCurr15MinCorrectedBlks.19 = Gauge32: 6278668
ADSL-LINE-MIB::adslAtucChanPerfCurr15MinUncorrectBlks.19 = Gauge32: 17489
ADSL-LINE-MIB::adslAtucChanPerfCurr1DayCorrectedBlks.19 = Gauge32: 1783723602
ADSL-LINE-MIB::adslAtucChanPerfCurr1DayUncorrectBlks.19 = Gauge32: 10061120
ADSL-LINE-MIB::adslAturChanCorrectedBlks.19 = Counter32: 4235
ADSL-LINE-MIB::adslAturChanUncorrectBlks.19 = Counter32: 9371
ADSL-LINE-MIB::adslAturChanPerfCurr15MinCorrectedBlks.19 = Gauge32: 0
ADSL-LINE-MIB::adslAturChanPerfCurr15MinUncorrectBlks.19 = Gauge32: 0
ADSL-LINE-MIB::adslAturChanPerfCurr1DayCorrectedBlks.19 = Gauge32: 1538
ADSL-LINE-MIB::adslAturChanPerfCurr1DayUncorrectBlks.19 = Gauge32: 5374

Many of the figure in the output of 'show dsl interface atm0' are available via SNMP. For example 'Down Stream Noise Margin' is 'adslAturCurrSnrMgn' quoted in tenths of a decibel. So a reading of 15 means 1.5dB. To get an idea of how these values were changing with time, I used Cacti and RRD to poll the SNMP and produce some nice graphs. I Googles and found a Cacti 'ADSL Line Data' templates that allows you to graph Attenuation, Noise margin, and Output Power. Of those, only Noise Margin is interesting, because I found the Attenuation and Output Power to remain constant. But the thing the template did not graph was the number of uncorrectable errors, labelled 'adslAtucChanUncorrectBlks' in the MIB, and this is what I was really interested in, as when we were having problems with the ADSL line, the count of uncorrectable errors was high.

So I modified '/var/www/cacti/resource/snmp_queries/adslLine.xml' to include another OID. I chose 'ADSL-LINE-MIB::adslAtucChanPerfCurr15MinUncorrectBlks' which the MIB decribes as follows:

# snmptranslate -Td ADSL-LINE-MIB::adslAtucChanPerfCurr15MinUncorrectBlks
ADSL-LINE-MIB::adslAtucChanPerfCurr15MinUncorrectBlks
adslAtucChanPerfCurr15MinUncorrectBlks OBJECT-TYPE
-- FROM ADSL-LINE-MIB
MAX-ACCESS read-only
STATUS current
DESCRIPTION "Count of all blocks received with uncorrectable
errors on this channel within the current 15 minute
interval."

I modified the XML for the data query as follows:

# snmptranslate -On ADSL-LINE-MIB::adslAtucChanPerfCurr15MinUncorrectBlks
.1.3.6.1.2.1.10.94.1.1.10.1.11

# diff -u adslLine.orig.xml adslLine.xml
--- adslLine.orig.xml 2009-05-01 16:57:18.000000000 +0100
+++ adslLine.xml 2009-06-15 18:53:58.000000000 +0100
@@ -27,6 +27,13 @@
<direction>input</direction>
<oid>.1.3.6.1.2.1.10.94.1.1.3.1.2</oid>
</adslAtucInvVendorID>
+ <adslAtuc15mUncrBlks>
+ <name>Local Curr 15min Uncorrectable Blocks</name>
+ <method>walk</method>
+ <source>value</source>
+ <direction>output</direction>
+ <oid>.1.3.6.1.2.1.10.94.1.1.10.1.11</oid>
+ </adslAtuc15mUncrBlks>
<adslAtucCurrSnrMgn>
<name>Local Noise Margin (tenth dB)</name>
<method>walk</method>

When I started to graph this value, I was surprised by the range of values I saw. Often it was low, a few hundred, but some time it would shoot up to 60,000! I realised that to graph this range of values it would be better to use a logarithmic scale.

After graphing uncorrectable errors for a while I began to question what does 'within the current 15 minute interval.' actually mean. So let's look in detail at the OID's related to uncorrectable errors. Here are some (high) starting figures:

# snmpwalk -v1 -c public 192.168.6.1 1.3.6.1.2.1.10.94.1.1.10.1
SNMPv2-SMI::transmission.94.1.1.10.1.3.19 = Counter32: 293561906
SNMPv2-SMI::transmission.94.1.1.10.1.4.19 = Counter32: 1493720
SNMPv2-SMI::transmission.94.1.1.10.1.10.19 = Gauge32: 6671894
SNMPv2-SMI::transmission.94.1.1.10.1.11.19 = Gauge32: 80028
SNMPv2-SMI::transmission.94.1.1.10.1.15.19 = Gauge32: 293561906
SNMPv2-SMI::transmission.94.1.1.10.1.16.19 = Gauge32: 1493720

You can see that "adslAtucChanCorrectedBlks" and "adslAtucChanPerfCurr1DayUncorrectBlks" are identical values, and they were increasing exactly in step, so they are both SNMP 'Counters'. A few minutes latter, we get these reading:

# snmpwalk -v1 -c public 192.168.6.1 1.3.6.1.2.1.10.94.1.1.10.1
SNMPv2-SMI::transmission.94.1.1.10.1.3.19 = Counter32: 293628501
SNMPv2-SMI::transmission.94.1.1.10.1.4.19 = Counter32: 1495372
SNMPv2-SMI::transmission.94.1.1.10.1.10.19 = Gauge32: 27251
SNMPv2-SMI::transmission.94.1.1.10.1.11.19 = Gauge32: 1383
SNMPv2-SMI::transmission.94.1.1.10.1.15.19 = Gauge32: 293628501
SNMPv2-SMI::transmission.94.1.1.10.1.16.19 = Gauge32: 1495372

In the period between taking these readings, the '15Min' figures, have reset to zero, and are counting up again. If you keep a constant watch on these figures, you can see that every 15 minutes they reset back to zero. I found the following command to be useful to monitor errors occuring in real time.

# watch -n 5 'snmpwalk -v1 -c public 192.168.6.1 1.3.6.1.2.1.10.94.1.1.10.1'

So with the benefit of hindsight, I may have been better using Cacti to graph 'adslAtucChanUncorrectBlks' as a 'COUNTER', rather than 'adslAtucChanPerfCurr15MinUncorrectBlks' as a 'GAUGE'.

Eventually we got BT to come on site to check the ADSL line, and they located
a bad connection. Since they fixed that, the noise margin has remained consistently above 5dB and the uncorrected blocks has been very low.

Here are some graph, where you can that the ADSL line has been fixed:

Tuesday, October 07, 2008

A new way to search & browse the OpenSolaris Mail Lists

You can now search and browse through over a quarter million messages posted to the OpenSolaris Mail-lists using a free service called 'MarkMail'.

Head over to http://opensolaris.markmail.org/ and take a look.

I stumbled across the MarkMail web site last week,via Google. It seems to have been running for about 12 months, and in that time they have loaded the mail archives of thousands of open-source projects into their system.

But until last week, they did not include anything on OpenSolaris. A shocking omission! So I left feedback, and I was quickly contacted by MarkMail's Jason Hunter. Jason wasted no time in subscribing their system to most of the OpenSolaris lists, and then within a few days they had loaded the 'MailMan' archive files. (If you would like to see further lists added then leave feedback on the MarkMail site via their form.)

By going to this link http://opensolaris.markmail.org/search/?q= you can see the latest emails that have been added into the system. I find this a useful way of just getting some idea of the vast amount of activity being generated around the OpenSolaris communities and projects. If your browser has 'flash' enabled, you will also see a graph of email activity per month, which shows the story of tremendous growth in interest around OpenSolaris. You will also see some fascinating statistics which tell you which are the most popular mail lists and which individuals have made the most posts.

Tuesday, September 16, 2008

Progress on FISHworks?

Very little has been publicly announced by Sun about the FISHworks project.
After some initial news in early 2007, it seemed to go quiet.

Based on those reports, I would speculate that FISHworks appears to be a project to create a NAS/SAN appliance, based on existing OpenSolaris source code for ZFS, CIFS server, NFS server and the iScsi target, using DTrace observability to instrument under the hood.

At the start of September 2008, some more details about FISHworks emerged
when one of Sun's top engineers Mike Shapiro gave a video interview, bizarrely, on the noisy streets of San Francisco! It's a pity that this interview was conducted outside of a bar, as the background noise is somewhat distracting!

So when can we expect a proper announcement from Sun on FISHworks. Well according to the New York Time, that will be in November.

So do you want to know what progress is being made?

Then I suggest you check out the OpenSolaris bugs database.
This link shows bug activity in the iScsi target code for the last 30 days.

In the 'Reported Against' field, you will often see things like fw_36. This I believe is referring to FISHworks build 36.

Looking at the bugs for the last 30 days, there seems to be extensive & intensive testing happening of the OpenSolaris iScsi target, using various iScsi initiators and various operating systems, like Vista & Linux.

Some of the bugs looks quite obscure, occurring in relatively complex scenarios. Many of the bug comments refer to 'GRITS' and 'Diskomizer'. These seem to be Sun's internals tools for stress testing storage systems. Again, there does not seem to be much public information on these tools, but I googled and found these snippets on GRITS and Diskomizer.

It's good to see that in GRITS and Diskomizer, Sun seems to have the tools that
can expose these bugs. It's also good that these bugs are being fixed quickly.
Sun's Tim Szeto looks to have been particularly busy, and making good progress.

Some of the bug descriptions mention the 'Iwashi NAS appliance'
-So what it that? I would speculate maybe the code name for a Sun appliance?

Wednesday, August 20, 2008

'Java.exe' and Oracle SQL Developer

I've just upgraded to v1.5.1 of 'Oracle SQL Developer'. When you first start the program, it will ask for the path to 'java.exe'. If you need to change or reset that path to a different value, you need to locate text file 'sqldeveloper.conf' and edit the value of setting 'SetJavaHome'. You may want to change this setting if you try to use an 'Unsupported Java Version'.

Friday, August 15, 2008

OpenSolaris and the HP ML-115

The HP ML-115 is a very low cost server, that is ideal for trying out VMware ESX:
"Using the HP ML115 as a Test Server"
"Building a Low Cost (Cheap) VMware ESX Test Server"

So maybe the ML-115 would also be suitable for OpenSolaris.
So I tried it out, but there are some driver issues.
The good news is that these should be fixed very soon.

Problem #1 -If you try to boot the Open Solaris 2008.05 live CD, it fails, with error "Console login service(s) cannot run". See this thread.

The problem is that the ML-115's DVD drive is connected to a SATA port on the nVidia MCP55 chipset, and the nv_sata driver has only support for disk drives and not CD/DVD drives. But now Bug 6595488 is fixed, if you use build 95.

Ok, so with build 95, you should be able to boot from the DVD drive and install OpenSolaris. I tried this with "Solaris Express Community Edition" and was successful. You should then see something like this:

# cfgadm -a | grep sata | grep configured
sata0/0::dsk/c0t0d0 disk connected configured ok
sata1/0::dsk/c4t0d0 disk connected configured ok
sata2/0::dsk/c1t0d0 cd/dvd connected configured ok
# rmformat -l
Looking for devices...
1. Logical Node: /dev/rdsk/c1t0d0p0
Physical Node: /pci@0,0/pci103c,1714@5,2/cdrom@0,0
Connected Device: HL-DT-ST DVD-RAM GH15L FA01
Device Type: DVD Reader/Writer
Size: 2.9 GB
# cdrw -l
Looking for CD devices...
Node Connected Device Device type
----------------------+--------------------------------+-----------------
cdrom0 | HL-DT-ST DVD-RAM GH15L FA01 | CD Reader/Writer
# modinfo | grep nv_sata
45 fffffffff7883000 6458 207 1 nv_sata (Nvidia ck804/mcp55 HBA v1.7)

Problem #2 - X will not run to give a graphical interface. If you check the Xerrors file, you will see:

(EE) MGA(0): Given color and fb depth combination not supported by this driver
(EE) Screen(s) found, but none have a usable configuration.

The graphics chipset on ML-115 is identified as "MGA G200e [Pilot] ServerEngines"
aka "mgag200 SE A PCI" aka "vendor 0x102b device 0x0522" and uses the MGA driver, in file "mga_drv.so". The version of the MGA driver supplier with OpenSolaris is version 1.4.6

I tried booting a Linux LiveCD, GRML v1.1, which uses
version 1.4.7 of the MGA driver and this worked without problem on the ML-115.

The OpenSolaris MGA driver has recently been updated. See Bug 6735095 and that fix should be available in build 97. Let's hope that fixes the X problem.

Problem #3 - No network card detected. The ML-115 network card is identified as
"Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express" aka "vendor 0x14e4 device 0x165a". The fix for this should be available in build 96 - see Bug 6726056.

Updated ISO for OpenSolaris to build 95 are available now. For an update to build 97, I guess that means a wait of a further 4 weeks.

Friday, June 20, 2008

ISDN Caller ID and Cisco ISDN routers

We recently had a ISDN-2e line installed at work. But I forgot to ask BT (British Telecom) for 'caller ID' aka CLID to be enabled. BT terminology for this is the 'Calling Line Identity Presentation' or CLIP service, and BT will only enable it if you specifically ask for it, and there is an additional charge for the service.

Without the number of the calling party, the Cisco router does not know which 'Dialer profile' to associate with the incoming call, and rejects the call.

I kept a record of the log from the Cisco router, before and after enabling CLID, so here we can see the differences.

While doing the testing & troubleshooting, I had debugging enable on the cisco router as follows:

#show debug
Dial on demand:
Dial on demand events debugging is on

The following ISDN debugs are enabled on all DSLs:

debug isdn error is ON.
debug isdn event is ON.
debug isdn q931 is ON.

First an incoming data call, before BT enabled CLID on the ISDN line:

ISDN BR0 Q931: RX <- SETUP pd = 8 callref = 0x01
Sending Complete
Bearer Capability i = 0x8890
Standard = CCITT
Transfer Capability = Unrestricted Digital
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0x89
Called Party Number i = 0x81, '67xxxx'
Plan:ISDN, Type:Unknown
ISDN BR0 EVENT: process_rxstate: ces/callid 1/0x14 calltype 1 HOST_INCOMING_CALL
ISDN BR0:1: Incoming call rejected, unbindable
ISDN BR0 **ERROR**: host_incoming_call: DIALER ERROR 0x1: b channel 0, call id 0x14
ISDN BR0 EVENT: process_rxstate: ces/callid 1/0x14 calltype 1 HOST_DISCONNECT_ACK
ISDN BR0 Q931: TX -> RELEASE_COMP pd = 8 callref = 0x81
Cause i = 0x8095 - Call rejected

After BT enabled CLID, which was done within a couple of hours, the log shows this:

ISDN BR0 Q931: RX <- SETUP pd = 8 callref = 0x01
Sending Complete
Bearer Capability i = 0x8890
Standard = CCITT
Transfer Capability = Unrestricted Digital
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0x89
Calling Party Number i = 0x2183, '142761xxxx'
Plan:ISDN, Type:National
Called Party Number i = 0x81, '67xxxx'
Plan:ISDN, Type:Unknown
ISDN BR0 EVENT: process_rxstate: ces/callid 1/0x17 calltype 1 HOST_INCOMING_CALL
BR0:1 DDR: Caller id 142761xxxx matched to profile
%DIALER-6-BIND: Interface BR0:1 bound to profile Di1

I also tried making a voice call to the ISDN number, which I knew would fail, just to see what it looked like. Here is before CLID was enabled:

ISDN BR0 Q931: RX <- SETUP pd = 8 callref = 0x01
Sending Complete
Bearer Capability i = 0x8090A3
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0x89
Called Party Number i = 0x81, '67xxxx'
Plan:ISDN, Type:Unknown
ISDN BR0 EVENT: process_rxstate: ces/callid 1/0x11 calltype 2 HOST_INCOMING_CALL
ISDN BR0 **ERROR**: host_incoming_call: Received a call with a bad bearer cap from <unknown> on B1

And here is a voice call, after CLID enabled:

ISDN BR0 Q931: RX <- SETUP pd = 8 callref = 0x01
Sending Complete
Bearer Capability i = 0x8090A3
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0x89
Calling Party Number i = 0x2183, '142768xxxx'
Plan:ISDN, Type:National
Called Party Number i = 0x81, '67xxxx'
Plan:ISDN, Type:Unknown
ISDN BR0 EVENT: process_rxstate: ces/callid 1/0x19 calltype 2 HOST_INCOMING_CALL
ISDN BR0 **ERROR**: host_incoming_call: Received a call with a bad bearer cap from 142768xxxx on B1

By the way, the Cisco router was a 1721 using IOS v12.3

#show version
Cisco Internetwork Operating System Software
IOS (tm) C1700 Software (C1700-Y-M), Version 12.3(26), RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by cisco Systems, Inc.
Compiled Mon 17-Mar-08 14:24 by dchih

ROM: System Bootstrap, Version 12.2(7r)XM1, RELEASE SOFTWARE (fc1)
System returned to ROM by power-on
System restarted at 12:53:16 GMT Wed May 21 2008
System image file is "flash:c1700-y-mz.123-26.bin"

cisco 1721 (MPC860P) processor (revision 0x100) with 28231K/4537K bytes of memory.
MPC860P processor: part number 5, mask 2
Bridging software.
X.25 software, Version 3.0.0.
Basic Rate ISDN software, Version 1.1.
1 FastEthernet/IEEE 802.3 interface(s)
1 Serial(sync/async) network interface(s)
1 ISDN Basic Rate interface(s)
32K bytes of non-volatile configuration memory.
16384K bytes of processor board System flash (Read/Write)

Thursday, June 12, 2008

Oracle, Access & ODBC: Problems & Solution

I was using Microsoft Access, in this case Access 97, to connect to an Oracle 10g database, using the ability of Access to set linked tables using an ODBC data source.

For the client, I was using the Oracle Instant Client v10.1.0.4 and setting an environmental variable NLS_LANG.

I was seeing two problems. First, when trying to link to the tables/views on the Oracle server, after selecting the ODBC data source DSN, Access would list all the tables/views on the Oracle server, but It would only display the first character of the name of the tables/views, making it impossible to find the correct table/view.

Second, when I tried setting up the Oracle linked tables in the Access MDB file on another PC, where everything was working ok, and then going back to the original problem PC and opening same MDB file, when opening the Oracle linked tables, all rows and all fields contained #Deleted.

I did a Google and found this Microsoft KnowledgeBase article - Q913070:
#Deleted is displayed in the records when you open a linked ODBC table from an Oracle 10g database in Access 2003, in Access 2002, or in Access 2000


To cut a long story short, I had set the client side environmental variable
NLS_LANG = ENGLISH_UNITED KINGDOM.WE8ISO8859P1

but this did not match the setting on the Oracle database, and I was advised
by our Oracle DBA that I should have been using:
NLS_LANG = ENGLISH_UNITED KINGDOM.WE8MSWIN1252


Apparently, our Oracle DBA had changed the NLS_CHARACTERSET of the database at some stage in the past.

After I corrected the value of NLS_LANG, everything starting working ok. It seems WE8MSWIN1252 is a binary super set of WE8ISO8859P1.

See these links for further background: