Thursday, August 30, 2007

WMI and "Permission denied"

Today we were trying to use vbscript and WMI to audit the software installed on PCs connected to our network. The PC's were running Windows XP Pro, and were members of an Active Directory Domain.

The script worked fine in most cases, but we occasionally got errors like these:

Error 462 on GetObject()
"The remote server machine does not exist or is unavailable"

Error 70 on GetObject()
"Permission denied"


The line of code in question was:
Set objReg = GetObject("winmgmts:{impersonationLevel=impersonate,authenticationLevel=Pkt}!//" & strComputer & sDNS & "/root/default:StdRegProv")


To cut a long story short, we used WireShark to capture the conversation between the source and target PC's.

Here is a RPC packet sent from the source to the target PC.
The target PC is "PC-ONE-LAP".

DCE RPC Bind, Fragment: Single, FragLen: 1440, Call: 2
Version: 5
Version (minor): 0
Packet type: Bind (11)
Packet Flags: 0x03
Data Representation: 10000000
Frag Length: 1440
Auth Length: 1360
Call ID: 2
Max Xmit Frag: 5840
Max Recv Frag: 5840
Assoc Group: 0x0000b254
Num Ctx Items: 1
Ctx Item[1]: ID:1
Context ID: 1
Num Trans Items: 1
Abstract Syntax: ISystemActivator V0.0
Interface: ISystemActivator UUID: 000001a0-0000-0000-c000-000000000046
Interface Ver: 0
Interface Ver Minor: 0
Transfer Syntax[1]: 8a885d04-1ceb-11c9-9fe8-08002b104860 V2
Auth type: SPNEGO (9)
Auth level: Connect (2)
Auth pad len: 0
Auth Rsrvd: 0
Auth Context ID: 1277840
GSS-API Generic Security Service Application Program Interface
OID: 1.3.6.1.5.5.2 (SPNEGO - Simple Protected Negotiation)
SPNEGO
negTokenInit
mechTypes: 3 items
Item: 1.2.840.48018.1.2.2 (MS KRB5 - Microsoft Kerberos 5)
Item: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
Item: 1.3.6.1.4.1.311.2.2.10 (NTLMSSP - Microsoft NTLM Security Support Provider)
mechToken: 6E82050A30820506A003020105A10302010EA20703050020...
krb5_blob: 6E82050A30820506A003020105A10302010EA20703050020...
Kerberos AP-REQ
Pvno: 5
MSG Type: AP-REQ (14)
Padding: 0
APOptions: 20000000 (Mutual required)
Ticket
Tkt-vno: 5
Realm: COMPANY.NET
Server Name (Service and Instance): RPCSS/PC-ONE-LAP.company.net
enc-part rc4-hmac
Authenticator rc4-hmac


You can see it is using kerberos to authenticate.
Here is the response from the target PC:

DCE RPC Bind_ack, Fragment: Single, FragLen: 199, Call: 2
Version: 5
Version (minor): 0
Packet type: Bind_ack (12)
Packet Flags: 0x03
Data Representation: 10000000
Frag Length: 199
Auth Length: 131
Call ID: 2
Max Xmit Frag: 5840
Max Recv Frag: 5840
Assoc Group: 0x0000b254
Scndry Addr len: 4
Scndry Addr: 135
Num results: 1
Context ID[1]
Auth type: SPNEGO (9)
Auth level: Connect (2)
Auth pad len: 0
Auth Rsrvd: 0
Auth Context ID: 1277840
GSS-API Generic Security Service Application Program Interface
SPNEGO
negTokenTarg
negResult: accept-incomplete (1)
supportedMech: 1.2.840.48018.1.2.2 (MS KRB5 - Microsoft Kerberos 5)
responseToken: 606606092A864886F71201020203007E573055A003020105...
krb5_blob: 606606092A864886F71201020203007E573055A003020105...
KRB5 OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
krb5_tok_id: KRB5_ERROR (0x0003)
Kerberos KRB-ERROR
Pvno: 5
MSG Type: KRB-ERROR (30)
stime: 2007-08-30 10:19:06 (Z)
susec: 851504
error_code: KRB5KRB_AP_ERR_MODIFIED (41)
Realm: COMPANY.NET
Server Name (Principal): PC-TWO-LAP$

The response show an Kerberos error "KRB5KRB_AP_ERR_MODIFIED", as it is basically saying - hold on, my name is "PC-TWO-LAP" and not "PC-ONE-LAP".

In the System Event Log for the source PC, you will see the following:
Event Type: Error
Event Source: DCOM
Event Category: None
Event ID: 10009
Date: 30/08/2007
Time: 10:22:52
User: COMPANY\USER
Computer: SOURCEPC
Description:
DCOM was unable to communicate with the computer PC-ONE-LAP.company.net
using any of the configured protocols.

Event Type: Error
Event Source: Kerberos
Event Category: None
Event ID: 4
Date: 30/08/2007
Time: 10:22:52
User: N/A
Computer: SOURCEPC
Description:
The kerberos client received a KRB_AP_ERR_MODIFIED error from the
server PC-TWO-LAP$. This indicates that the password used to encrypt
the kerberos service ticket is different than that on the target server.
Commonly, this is due to identically named machine accounts in the
target realm (COMPANY.NET), and the client realm.

The explanation is that the DNS server was returning the same IP address for both "PC-ONE-LAP" and "PC-TWO-LAP". Only "PC-TWO-LAP" was actually connected to the network, but when we tried "PC-ONE-LAP", it was actually "PC-TWO-LAP" that responded.

Kerberos authenticates both the User and the Computer, so that is why we got the error. If you get this error, then check your DNS records.

For other similar problems you may find this link useful:
Failed to get remote resources: Remote server is unavailable. The RPC server is unavailable.

Monday, August 27, 2007

A problem with wget on OpenSolaris

Ok, here is a stupid little problem.I waisted an hour trying to figure this one out. Maybe this will help someone to avoid the same mistake!

I was using 'wget' to try to download a file from a http server onto my OpenSolaris PC. Here is what I did:
# uname -a
SunOS solaris 5.11 snv_60 i86pc i386 i86pc
# wget -V
GNU Wget 1.10.2
# cd /home
# /usr/sfw/bin/wget http://www.nwsmith.net/index.htm
--18:31:13-- http://www.nwsmith.net/index.htm
=> `index.htm'
Resolving www.nwsmith.net... NNN.NNN.NNN.NNN
Connecting to www.nwsmith.net|NNN.NNN.NNN.NNN|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11,774 (11K) [text/html]
index.htm: Operation not applicable
Cannot write to `index.htm' (Operation not applicable).

Did you spot my mistake?
I must have thought I was using a Linux PC, because then the '/home' directory would have been fine. But on Solaris...
# ls -ld /home
dr-xr-xr-x 1 root root 1 Apr 18 00:03 /home

...you cannot write to that directory.
Choose a directory that is writeable, and then wget will work without error.

Thursday, August 16, 2007

fdisk, sfdisk, mdadm and SCSI hard drive geometry

At work, one of our servers, uses Linux software RAIDand we have two mirrored hard drives setup as RAID1.
Smartmontools reported that one of the hard drives was starting to fail:
# smartctl -a /dev/sda
SMART Health Status: LOGICAL UNIT FAILURE PREDICTION THRESHOLD EXCEEDED [asc=5d,ascq=2]
# smartctl -l selftest /dev/sda
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Failed in segment --> 2 10572 0x 22f0e6c [0x4 0x40 0x85]
# 2 Background long Failed in segment --> 2 10404 0x 22f0e6c [0x3 0x11 0x0]
# 3 Background long Failed in segment --> 2 10236 0x 22f0e63 [0x4 0x40 0x85]

So we executed the following commands:
mdadm --manage /dev/md0 --set-faulty  /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --set-faulty /dev/sda2
mdadm --manage /dev/md1 --remove /dev/sda2
mdadm --manage /dev/md2 --set-faulty /dev/sda3
mdadm --manage /dev/md2 --remove /dev/sda3
mdadm --manage /dev/md3 --set-faulty /dev/sda5
mdadm --manage /dev/md3 --remove /dev/sda5

And then hot un-plugged the drive.
When the replacement drive arrived, although it was an identical model
to the original, and had an identical total size, the new drive had a different geometry.
[root@ifsclstr02 ~]# smartctl -i /dev/sda
Device: IBM-ESXS BBD036C3ESTT0ZFN Version: JP86
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)

[root@ifsclstr02 ~]# smartctl -i /dev/sdb
Device: IBM-ESXS BBD036C3ESTT0ZFN Version: JP85
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)

# fdisk -l /dev/sda
Disk /dev/sda: 36.4 GB, 36401479680 bytes
64 heads, 32 sectors/track, 34715 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

# fdisk -l /dev/sdb
Disk /dev/sdb: 36.4 GB, 36401479680 bytes
255 heads, 63 sectors/track, 4425 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

The potential problem was that if we had to specify the start and end of the partition in terms of cylinders, then we would not be able to get an exact match in the size of the partitions between the new disk and the existing working half of the mirror.
After some googling, we concluded that having to align the partition boundaries with the cylinders was a DOS legacy issue, and was not something that would cause a problem for Linux.
So to copy the partitions from the working disk to the new disk we used the following:
# sfdisk -d /dev/sdb | sfdisk --Linux /dev/sda
Checking that no-one is using this disk right now ...
OK
Disk /dev/sda: 34715 cylinders, 64 heads, 32 sectors/track
Old situation:
Units = cylinders of 1048576 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sda1 0 - 0 0 0 Empty
/dev/sda2 0 - 0 0 0 Empty
/dev/sda3 0 - 0 0 0 Empty
/dev/sda4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sda1 * 63 208844 208782 fd Linux raid autodetect
/dev/sda2 208845 16980704 16771860 fd Linux raid autodetect
/dev/sda3 16980705 25366634 8385930 fd Linux raid autodetect
/dev/sda4 25366635 71087624 45720990 5 Extended
/dev/sda5 25366698 71087624 45720927 fd Linux raid autodetect
Warning: partition 1 does not end at a cylinder boundary
Warning: partition 2 does not start at a cylinder boundary
Warning: partition 2 does not end at a cylinder boundary
Warning: partition 3 does not start at a cylinder boundary
Warning: partition 3 does not end at a cylinder boundary
Warning: partition 4 does not start at a cylinder boundary
Warning: partition 4 does not end at a cylinder boundary
Warning: partition 5 does not end at a cylinder boundary
Successfully wrote the new partition table
Re-reading the partition table ...
# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda2
# mdadm --manage /dev/md2 --add /dev/sda3
# mdadm --manage /dev/md3 --add /dev/sda5
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0] sdb2[1]
8385856 blocks [2/2] [UU]

md2 : active raid1 sda3[2] sdb3[1]
4192896 blocks [2/1] [_U]
resync=DELAYED
md3 : active raid1 sda5[2] sdb5[1]
22860352 blocks [2/1] [_U]
[============>........] recovery = 61.8% (14148928/22860352) finish=2.1min speed=67707K/sec
md0 : active raid1 sda1[0] sdb1[1]
104320 blocks [2/2] [UU]

unused devices:

Using the sfdisk command, you can specify the unit of measure when listing the partition table. Use '-uS' for Sectors and '-uC' for Cylinders:
# sfdisk -l -uS /dev/sda
Disk /dev/sda: 34715 cylinders, 64 heads, 32 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sda1 * 63 208844 208782 fd Linux raid autodetect
/dev/sda2 208845 16980704 16771860 fd Linux raid autodetect
/dev/sda3 16980705 25366634 8385930 fd Linux raid autodetect
/dev/sda4 25366635 71087624 45720990 5 Extended
/dev/sda5 25366698 71087624 45720927 fd Linux raid autodetect

# sfdisk -l -uS /dev/sdb
Disk /dev/sdb: 4425 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sdb1 * 63 208844 208782 fd Linux raid autodetect
/dev/sdb2 208845 16980704 16771860 fd Linux raid autodetect
/dev/sdb3 16980705 25366634 8385930 fd Linux raid autodetect
/dev/sdb4 25366635 71087624 45720990 5 Extended
/dev/sdb5 25366698 71087624 45720927 fd Linux raid autodetect

It's only when you think in terms of cylinders, that there appears to be a problem:
# sfdisk -l -uC /dev/sda
Disk /dev/sda: 34715 cylinders, 64 heads, 32 sectors/track
Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.
Units = cylinders of 1048576 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sda1 * 0+ 101- 102- 104391 fd Linux raid autodetect
/dev/sda2 101+ 8291- 8190- 8385930 fd Linux raid autodetect
/dev/sda3 8291+ 12386- 4095- 4192965 fd Linux raid autodetect
/dev/sda4 12386+ 34710- 22325- 22860495 5 Extended
/dev/sda5 12386+ 34710- 22325- 22860463+ fd Linux raid autodetect

# sfdisk -l -uC /dev/sdb
Disk /dev/sdb: 4425 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdb1 * 0+ 12 13- 104391 fd Linux raid autodetect
/dev/sdb2 13 1056 1044 8385930 fd Linux raid autodetect
/dev/sdb3 1057 1578 522 4192965 fd Linux raid autodetect
/dev/sdb4 1579 4424 2846 22860495 5 Extended
/dev/sdb5 1579+ 4424 2846- 22860463+ fd Linux raid autodetect

The pluses and minuses, just mean that the numbers are not exact and are rounded up or down.
For comparison, here is what fdisk reports:
# fdisk -l /dev/sda
Disk /dev/sda: 36.4 GB, 36401479680 bytes
64 heads, 32 sectors/track, 34715 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 102 104391 fd Linux raid autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2 102 8292 8385930 fd Linux raid autodetect
/dev/sda3 8292 12387 4192965 fd Linux raid autodetect
/dev/sda4 12387 34711 22860495 5 Extended
/dev/sda5 12387 34711 22860463+ fd Linux raid autodetect

# fdisk -l /dev/sdb
Disk /dev/sdb: 36.4 GB, 36401479680 bytes
255 heads, 63 sectors/track, 4425 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 1057 8385930 fd Linux raid autodetect
/dev/sdb3 1058 1579 4192965 fd Linux raid autodetect
/dev/sdb4 1580 4425 22860495 5 Extended
/dev/sdb5 1580 4425 22860463+ fd Linux raid autodetect

Wednesday, August 08, 2007

Smartmontools and fixing Unreadable Disk Sectors

Smartmontools was showing some problems on the disk.
At least two bad LBAs:
# smartctl -l selftest /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 20% 1596 44724966
# 2 Extended offline Completed: read failure 40% 1519 12622427

# smartctl -A /dev/hda | egrep 'Reallocated|Pending|Uncorrectable'
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 2
196 Reallocated_Event_Count 0x0008 252 252 000 Old_age Offline - 1
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 2
198 Offline_Uncorrectable 0x0008 252 252 000 Old_age Offline - 1

I found a document:
"SHOWS HOW TO IDENTIFY THE FILE ASSOCIATED
WITH AN UNREADABLE DISK SECTOR, AND HOW TO
FORCE THAT SECTOR TO REALLOCATE."

and followed the procedure.

Note the LBA values are given as decimal values. The document seems to refer
to an older version of smarctl that gives the LBA as a hexadecimal number.

Lets look at the partition sizes to see where this LBA drops in.
# fdisk -lu /dev/hda

Disk /dev/hda: 255 heads, 63 sectors, 3738 cylinders
Units = sectors of 1 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 63 160649 80293+ 83 Linux
/dev/hda2 160650 1204874 522112+ 82 Linux swap
/dev/hda3 1204875 53737424 26266275 83 Linux
/dev/hda4 53737425 60050969 3156772+ f Win95 Ext'd (LBA)
/dev/hda5 53737488 55841939 1052226 83 Linux
/dev/hda6 55842003 57946454 1052226 83 Linux
/dev/hda7 57946518 60050969 1052226 83 Linux

Ok, so the problem is in '/dev/hda3 '.
What's mounted there? As the partitions are labeled, we need to use:
# grep `e2label /dev/hda3` /etc/fstab
LABEL=/var /var ext3 defaults 1 2

Ok, so the problem is in '/var'.
# tune2fs -l /dev/hda3 | grep Block
Block count: 6566568
Block size: 4096

Let's do the maths:
LBA 12622427 - 1204875 and multiply by (512/4096) equals 1427194.
LBA 44724966 - 1204875 and multiply by (512/4096) equals 5440011.375

Ok, now let's use 'debugfs':
# debugfs
debugfs 1.27 (8-Mar-2002)
debugfs: open /dev/hda3
debugfs: icheck 1427194
Block Inode number
1427194 526482
debugfs: ncheck 526482
Inode Pathname
526482 /log/ntp/peers.20070717
debugfs: icheck 5440011
icheck: Can't read next inode while doing inode scan
debugfs: quit

So that means LBA 12622427 is in file "/var/log/ntp/peers.20070717".
And it looks like LBA 44724966 is in currently unused space on the disk.

As this file is not critical, I will just overwrite part of it
to force it to be reallocated:
# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=1427194
1+0 records in
1+0 records out
# sync
# smartctl -A /dev/hda | egrep 'Reallocated|Pending|Uncorrectable'
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0008 252 252 000 Old_age Offline - 1
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 1
198 Offline_Uncorrectable 0x0008 252 252 000 Old_age Offline - 1

Ok, that seems to have made that error go away for the time being.

Then while googling I found this perl script to help with
automation of badblocks on Linux:

"smartfixdisk - assistant that helps to repair bad LBAs detected by Smartmontools"

developed by the "IT-Support-Group" (ISG.EE),
which is a service organisation of the
"Department of Information Technology and Electrical Engineering" (D-ITET)
of the "Swiss Federal Institute of Technology", Zurich.

I wanted to use it on a old RedHat 9 server.
The script immediately fell over on this line:
open(DISKEND,"</sys/block/$diskname/size") or die "$!";

Not too surprising, as the '/sys' does not exist on my old server!
It seems to be a feature of newer kernels.
On a Centos-5 box I tried this:
# cat /proc/ide/hda/capacity
78165360
# cat /sys/block/hda/size
78165360
# cat /proc/ide/hda/geometry
physical 16383/16/63
logical 65535/16/63

So DISKEND seems to be related to the number of sectors on the hard drive.
The '/proc' version was available on the old Redhat 9 server,
so I change the perl code line like this:
open(DISKEND,"</proc/ide/$diskname/capacity") or die "$!";

..and it was happy.
To figure out what the script is doing, it's useful to add in a few
'print' commands into the script, or just uncomment the ones that
already in place. Here's what the script told me on this server:
# ./smartfixdisk.pl --noaction /dev/hda
Block size = 4096, factor = 0.125
Searching for inode... this may take a while...

LBA 12622427
Partition and partition type: /dev/hda3 Linux_Ext2
Status: used
Comment: EXT2/3: File found at inode 526482: /log/ntp/peers.20070717

LBA 44724966
Partition and partition type: /dev/hda3 Linux_Ext2
Status: free
Comment: block not used in filesystem
dd if=/dev/zero of=/dev/hda seek=5590620 bs=4096 count=1 conv=sync

Looks good.
Ok, so let's finish off:
# dd if=/dev/zero of=/dev/hda seek=5590620 bs=4096 count=1 conv=sync
1+0 records in
1+0 records out
# sync
# smartctl -A /dev/hda | egrep 'Reallocated|Pending|Uncorrectable'
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0008 252 252 000 Old_age Offline - 1
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 252 252 000 Old_age Offline - 1

# smartctl -t long /dev/hda

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 17 minutes for test to complete.
Test will complete after Wed Aug 8 16:38:15 2007

Use smartctl -X to abort test.
# smartctl -l selftest /dev/hda
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1599 -
# 2 Extended offline Completed: read failure 20% 1596 44724966
# 3 Extended offline Completed: read failure 40% 1519 12622427

# smartctl -A /dev/hda | egrep 'Reallocated|Pending|Uncorrectable'
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always - 1
196 Reallocated_Event_Count 0x0008 252 252 000 Old_age Offline - 1
197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0008 253 252 000 Old_age Offline - 0

Ok, that looks to have cleared the errors for the time being.
But I'm going to keep a careful eye on that disk, using smartd and logwatch.