Tuesday, October 21, 2014

PowerShell: Managing User Profiles on Remote Machines

 Introduction

If you're like me, you're a charismatic force of nature whom everyone loves unconditionally. However, if you're job is like mine, you may often finding yourself wanting to remotely remove a user's  profile cache from a machine.

Profile cache is something a roaming domain profile creates when a user first logs into a machine. This holds the user's "CURRENT_USER" registry hive (ntuser.dat) as well as any profile folders that are not being redirected. If you're domain is setup to download redirected folders (offline file sync) it also stores those.

Why would you want to remove that? Various reasons. My usual use case is that one of my terminal servers (RDS) is running out of disk space. We have upwards of several thousand students that could potentially use our servers, but usually only a few hundred will use them in a given week. This means provisioning out enough space to hold several thousand user profiles (this would be several TB) would be too expensive and would never be necessary with proper profile management. Other reasons to remove local caches are: Corrupt profile, Old/bad settings stored in appdata, long login/other login problems, possibly security (but only if password caching is enabled) .

So how do we clear them. Windows has two native ways to do this, manually through the system properties interface, or through group policy using the "remove profiles older than x days" GPO. The first method works, but is limited; you must be logged into the machine (locally or rpd), you can only delete one profile at a time -- it taking several seconds to remove each profile, this method takes a really long time to remove a large number of profiles -- and the "manage user profiles" window can take a really long time ( 30 minutes or more) to actually open when there are hundreds of profiles on the machine. The GPO method is a bit better, but has two major caveats. First, the machine must be rebooted for the purge to run -- this means it cannot be done on demand when the system is in use -- and it cannot target specific accounts.

So we have a situation in which no real tool exists to do what we want. DelProf2 is one option I've looked at before, and while I'm sure it's a perfectly functional tool, as a rule I don't like running software to automate tasks when I don't know exactly what it's doing (from their change log it appears to be a very manual process). So from the short-comings methods above we want the following features
  • Ability to delete multiple profiles quickly
  • No reboot required
  • Command line for batching/automation
  • Ability to target specific profiles, or delete all older profiles
  • Fully remove all registry info and files of user's account
  • Do not have to be logged in / can be done remotely
Turns out all of this can be done via PowerShell and WMI.

 Enter PowerShell

First things first, here is the full code for the three different functions so you can follow along. They should be pretty easy to read, they're commented and have full get-help integration (except the first one, because it's very basic).
http://pastebin.com/SSKJ4bwt (Covert-UTCtoDateTime)
http://pastebin.com/wvUDki7p (Get-UserProfile)
http://pastebin.com/9Lu3rP9q (Remove-UserProfile)

The first one I've already talked about on a previous post. It's a simple function to convert UTC time strings into a datetime object that PowerShell understands.

Lets look at the next one then.

Get-UserProfile.

I'm going to skip over the get-help information, as doing so would be redundant. So the first bit of code is this:

    [CmdletBinding()] 
      param( 
     [Parameter(Mandatory=$False)][string]$UserID="%",
     [Parameter(Mandatory=$False)][string]$Computer="LocalHost",
     [Parameter(Mandatory=$False)][switch]$ExcludeSystemAccounts,
     [Parameter(Mandatory=$False)][switch]$OnlyLoaded,
     [Parameter(Mandatory=$False)][switch]$ExcludeLoaded,
     [Parameter(Mandatory=$False)][datetime]$OlderThan   
     
    )

These are our Cmdlet bindings. They allow us to pass parameters cleanly to the function. Notice nothing here is mandatory. If no parameters are passed to the function it defaults to return all profiles on the local system. There UserID and Computer parameters are given default values if none is specified ('%' is WMI speak for "all" -- analogous to '*' or '.*' in most regex systems)

Next:

Get-if(!(Get-Command Convert-UTCtoDateTime -ErrorAction SilentlyContinue)){
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "################################################################################"
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "#                                                                               "
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "This Program Requires cmdlet ""Convert-UTCtoDateTime""                          "
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "Find it here:                                                                   "
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "http://bisbd.blogspot.com/2014/10/adventures-in-powershell-converting-utc.html  "
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "#                                                                               "
    write-host -BackgroundColor "Black" -ForegroundColor "Red" "################################################################################"
    break;
}

Here we check to make sure our dependent function is loaded. If not, it displays this lovely warning with a link.
Next:
Get-if($Computer.ToLower() -eq "localhost"){
    
    
    $Return = Get-WmiObject -Query "Select * from win32_userprofile where LocalPath like '%\\$UserID'" 
    

}
else{
    $Return = get-wmiobject -ComputerName $Computer -Query "Select * from win32_userprofile where LocalPath like '%\\$UserID'" 
}

OK, first real code. here we do a quick check to see if we're running on the localhost or against a remote machine. The WMI query we run is the same for both, but the parameters we pass to the get-wmiobject function are slightly different.

On my machine doing -ComputerName $Computer actually works with 'localhsot', but only because 'localhost' is defined as 127.0.0.1 in the default hosts file. This might be a safe assumption to make on any windows system, but I try not to make assumptions when I can.

About the WMI query. For anyone familar with SQL this will probably look pretty familiar, the terminology is a bit different though.  We're selecting everything from the class "win32_userprofile" where the class "localpath" matches our UserID with a backslash in front of it. The backslash is there to prevent UIDs which are a substring of another UID from returning erroneous results. for example: if you had users 'bob' and 'jimbob' searching for 'bob' would return both bob and jimbob without the slash in front.

Here's a command to run to see all what is returned by this query.

Get-WmiObject -Query "Select * from win32_userprofile"

What you'll see, if this runs correctly, is a mess. There's only a few properties in here that are useful (which we'll filter out in a minute). A noticeable exclusion here, you'll notice, is that there is no Username type field. I just wanted to point this out here because it might seems strange to be looking at the LocalPath property otherwise.

So after this block of code we have a full list of user profiles stored in our "$Return" variable. Now it's time for some filtering.

Get-#Filter System Accounts
if($ExcludeSystemAccounts){
    $Return = $Return | Where-Object -Property Special -eq $False
}
#Filter out Loaded Accounts
if($ExcludeLoaded){
    $Return = $Return | Where-Object -Property Loaded -eq $False
}
#Filter otherthan loaded accounts
if($OnlyLoaded){
    $Return = $Return | Where-Object -Property Loaded -eq $True
}


Here are the first three filters. They're all pretty much the same. First they check if they're switch has been set, then use where-object to filter out certain properties. I do these all as individual if statements that modify the $return variable so that they can be chained together.

The two properties we're looking at here are "Special" and "Loaded". The Special property tells us if the account if an account is a non-user (i.e. system) account. You'll see things like "system", "network service", etc. listed as special. The "Loaded" property tells us if the account is currently in use. This property will be important later as you can't remove accounts that are currently loaded.

My inclusion of a "OnlyLoaded" flag might seem strange here. This is not directly related to the removal of user accounts, but an additional functionality. Combine "-OnlyLoaded" and "-ExcludeSystemAccounts" and you can find out what user(s) is(are) logged into the machine. Neat!

Let's look at the last filter now.
Get-#Filter on lastusetime
if([bool]$OlderThan){
$Return | Where-Object -property LastUseTime -eq $Null | % {Write-Host -BackgroundColor "Black" -ForegroundColor "Yellow" $_.LocalPath " Has no 'LastUseTime', omitting" }
$Return = $Return | Where-Object -property LastUseTime -ne $Null
$Return = $Return | Where-Object {$(Convert-UTCtoDateTime $_.LastUseTime -ToLocal) -lt $OlderThan }
}

This one has a bit more going on.

The if statement looks a bit different. Because the variable is a "system.datetime" object rather than a boolean, I'm typecasting it as a boolean. If the variable has been populated, this returns true, if the variable is $Null (that is, was not set), then it returns false. The type casting isn't strictly necessary simply doing "if($OlderThan)" would return the same thing. This is mostly just for readability.

The next lines warn the user that it's skipping over any user accounts with a $Null "lastusttime" property. This is one aspect that may need to be modified in the future, but I don't think so. I have never seen a $Null LastUseTime on an actual user account. Mostly it shows up on accounts created by programs. For example, my computer has ".Net v4.5 Classic", "DefaultAppPool",  and ".Net v4.5" as accounts with no lastusetime. Even local users who have been created, but never logged in, won't get caught by this; this is because they don't show up at all until their first logon, at which point they'll get a "lastusetime".

Finally, after filtering out the $Null entries, we convert the LastUseTime to a datetime object and compare it to the datetime passed to the -OlderThan parameter. By default the lastusetime is a very ugly string that is difficult to make sense of at a glance. More importantly, to do any sort of date math, powershell needs it in a datetime object. So this is where the Convert-UTCtoDateTime function comes in to play. This function takes the ugly UTC string and turns it into something powershell can understand.

One caveat here, the -ToLocal flag turns out to be important. When doing date math, powershell evidantly doesn't take time zones into consideration. So it is necessary to have both dates be in local time before doing math, otherwise it might not behave as expected. See the following example:



Next, and final block:

if($PSBoundParameters['Verbose'])
{
Write-Output $Return
}
else{
 Write-Output $Return | Select SID,LocalPath,@{Label="Last Use Time";Expression={Convert-UTCtoDateTime $_.LastUseTime -ToLocal}}    
}


Here we process our output. I've added support here for the powershell verbose flag. -Verbose is a native flag in powershell, which all Cmdlets have, even if they don't implement them. Normally this is used with the Write-Verbose Cmdlet, but I've done a little more. I didn't just want to write something additional when verbose is set, but wanted it formatted differently, that is, unformatted. This is necessary for this Cmdlets integration with the Remove-UserProfile Cmdlet. So I check to see if verbose has been set, if it has simply return the $Return variable. If it is not set by verbose, I format the output to look nice and show only the relevant information.

The relevant information here is the SID (Profile unique identifier), the LocalPath (c:\users\myuser), and the LastUseTime. The LastUseTime I modify with Convert-UTCtoDateTime to make it look nicer and be more useful at a glance.

That's about all there is to Get-UserProfile. Next We'll look at Remove-UserProfile, which uses Get-UserProfile.

Remove-UserProfile

Remove-UserProfile is very much an extension of Get-UserProfile. At a high level, it uses Get-UserProfile to obtain a list of user profiles then deletes them. That's really about it. Obviously there's a few checks and things in here as well, so lets go through that.

$ProfileList = Get-UserProfile -Verbose -UserID $UserID -Computer $Computer -ExcludeSystemAccounts -OlderThan $OlderThan

Since we've already done all the big work in the Get-UserProfile Cmdlet, all we need to do is call it with the appropriate flags. We use verbose so we get the full object, not just the filtered information. We exclude system accounts because we don't want to delete those for what I hope are obvious reasons -- I'm not sure that it'd actually let you, but better to be safe. We also use the -OlderThan flag regardless of whether the user has actually specified this.

Looking back at the parameter bindings, you see I've included a default value for $OlderThan that is one day in the future. This is for a couple of reasons. First, it's way more readable, no nested if statements with different querys. Second, this filters out the not-system-but-also-not-user accounts. I haven't tried removing these accounts to see what would actually happen, but I'm sure .net would be none too happy about it.

Next block

    if(!$ProfileList){
        Write-Warning "NO USER PROFILES WERE FOUND"
        RETURN;
    }

This is a simple $Null check to make sure the query actually returned something. If no profiles matched the criteria, the script exits.


    if(!$Batch){
        Write-Warning "ABOUT TO REMOVE THE FOLLOWING USER ACCOUNTS"
        Foreach($User in $ProfileList){
            $User | Select SID,LocalPath,@{Label="Last Use Time";Expression={Convert-UTCtoDateTime $_.LastUseTime -ToLocal}}
        }
        $Title = "PROCEED?"
        $Message = "ARE YOU SURE YOU WANT TO REMOVE THE LISTED USER ACCOUNTS?"
        $Yes = New-Object System.Management.Automation.Host.ChoiceDescription "&Yes","Removes User Accounts"
        $No = New-Object System.Management.Automation.Host.ChoiceDescription "&No","Exits Script, No Changes Will be Made"
        $options = [System.Management.Automation.Host.ChoiceDescription[]]($yes, $no)
        $result = $host.ui.PromptForChoice($title, $message, $options, 1) 
        switch ($result)
        {
            0 {}
            1 {return;}
        }

    }

The next bit here is a confirmation dialog. This is a built in PowerShell feature you can read more about here, but a few quick notes about my implementation. First, if the -Batch flag is set, it skips this. This is important as otherwise the script would always require user confirmation which would make it far less useful from an automation standpoint.

The foreach loop here lists out (in a nice format) all the user profiles to be deleted. This is a nice sanity check for the user to make sure they know what they're deleting.

On the choices, "$yes"/0 does nothing, and $no/1 exits the script, with the default being no. I wrote it this way to make the coding easier. With this continue/exit method, the rest of the Cmdlet doesn't have to be imbedded within the "switch($result)" block; which makes the mode much more readable and the -Batch code easier to write.


    Foreach($User in $ProfileList){
        if($User.Loaded){
            if(!$Batch){
            Write-Host -BackgroundColor "Black" -ForegroundColor "Red" "User Account " $User.LocalPath "is Currently in user on" $Computer ":`tSkipping"
            }
            else{
            Write-Output "User $($User.LocalPath) on $($Computer) was in use and could not be removed"
            }
            continue;
        }
        if(!$Batch){
        Write-Host -BackgroundColor "Blue" -ForegroundColor "Green" "Removing User $($UserID.LocalPath) from $($Computer)"
        }
        else{
        Echo "Deleting $($User.LocalPath) from $($Computer)"
        }
        $User.delete()


    }

Now we get into the actual deleting. A simple foreach loop that deletes everything that was returned by the Get-UserProfile Cmdlet. A few things to look at in here. I use if(!$Batch) in couple places. This is for formatting reasons. The only difference between the batch and non-batch output is the method of writing. Batch uses Write-Ouput (aka echo) which is nice because it can be redirected to a log file. However Write-Ouput lacks a lot of formatting options. So in non-batch mode I use Write-Host, which cannot be redirected to a file, but gives us some formatting/coloring options to make the output more readable.

Next, lets look at the if($User.Loaded). As discussed in Get-UserProfile, the loaded property tells us whether or not the profile is currently in use. It's important to filter these out otherwise PowerShell will throw errors when you try to delete the profile. Why not use the -ExcludeLoaded flag we created in Get-UserProfile? I debated about this for awhile actually, but decided it would be frustrating if you were trying to delete a specific profile and the script kept saying "no profiles found". This way provides more information, even if it wastes a bit more time.

And lastly, we delete the profile. "$User.Delete()" is really all it takes.

These three functions are about 250 lines all together. And you could get all the same functionality in this.

  Get-WmiObject -Computer MyComputer.mydomain -Query "Select * from win32_userprofie where LocalPath like '%\\MyUser'" | % {$_.Delete()}

Not entirely sure my aim in pointing this out. Maybe that there's a tradeoff between writing something you know, and something other people could use?

Anyway, hope someone else can get some use out of this. I know it's something that's bugged me for a long time. 

Wednesday, October 15, 2014

Adventures in PowerShell: Converting UTC to DateTIme

Intro

Ran into a problem recently. I'm working on script (more on this later) that will let me pull a list of user profiles on a remote machine. The problem is that the user profile's "last use time" a bit of information I would like to have is in UTC format System.String object -- meaning it looks like this:

20141015160319.191919+000
Pretty standard stuff. Except PowerShell doesn't know what to do with it. It blew my mind when I learned this. PowerShell doesn't have a native function to convert this type of string to a date (that is, a 'System.DateTime' object. So after an hour of Googling I couldn't find a good go-to script to handle it (at least, not one in PowerShell, several in VB or C#). So I wrote my own. It's a pretty simple script, it relies on it being in the specific format above, that is:

yyyyMMddhhmmss.ffffffzzz
 I may update this script if I find additional formats -- but being a standard, this should work in most places.

One note is that while the format specifies down 6 decimal places, windows can only handle 3, so the trailing 3 are dropped.

This will convert to your local timezone with the '-ToLocal' flag


The Code:

function Convert-UTCtoDateTime{
<#

  Author: Keith Ballou
  Date: 10/15/14

#>

    #Parameter Binding
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$True,Position=1)][string]$UTC,
        [Parameter(Mandatory=$false)][switch]$ToLocal
        )

    #Breakout the various portions of the time with substring
    #This is very inelegant, and UTC
    $yyyy = $UTC.substring(0,4)
    $M = $UTC.substring(4,2)
    $dd = $UTC.substring(6,2)
    $hh = $UTC.substring(8,2)
    $mm = $UTC.substring(10,2)
    $ss = $UTC.substring(12,2)
    $fff = $UTC.substring(15,3)
    $zzz = $UTC.substring(22,3)

    #If local, add the UTC offset returned by get-date
    if($ToLocal){
    (get-date -Year $yyyy -Month $M -Day $dd -Hour $hh -Minute $mm -Second $ss -Millisecond $fff) + (get-date -format "zzz")
    }
    #else just return the UTC time
    else{
    get-date -Year $yyyy -Month $M -Day $dd -Hour $hh -Minute $mm -Second $ss -Millisecond $fff
    }
}

Tuesday, September 30, 2014

More Printing Problems - Spooler and Citrix Print Manager Crash

Solution

Today the solution appears to be use HP UPD (I'm using 5.8, for the record) and recreate print queues (fancy name for shared printers) and have a nice script to handle failures elegantly.

Use of the HP Universal Print Driver is pretty much mandated for use in a terminal server environment (check out the HP compatibility list here (pdf)). Most HP printers new/old do not support the use of their device specific driver in a Citrix Xen* environment. 

Recreating the print queues apparently helps for not-quite-adequately-explained reasons. Apparently switching which driver a print queue is using can cause some sort of corruption that can crash the spooler. So you have to delete and re-add the shared printer (you can use the same port, etc.) but starting with the UPD driver, rather than using the device-specific one and changing later. I've done this for a couple of printers and it has already drastically reduced the number of crashes, will be recreating other printers that seem to be causing problems and will update this will progress.

Having a script that handles failures nicely is key to reducing user impact. Printers are always problematic, especially in a Xen/RDS environment, so expecting 0 crashes is probably over optimistic. I've written a script that runs when a service is detected to have failed (I do this through the 'recovery' tab in the service properties). The script restarts both the spooler and the Citrix Print Manager Service. For some reason, if these two are not restarted together, they don't seem to talk to each other. So whenever one fails, both need to be restarted. I'll put the script at the bottom of the article.


Introduction


We've had more problems with printing since fixing the issue with the Citrix Print Client. The issue now is that the print spooler on our terminal server keeps crashing and it causes people not to be able to print, printers not to map, etc.

In brief, there are two services "Citrix Print Management" (CpSVC) and "Print Spooler" (spooler). Even though we are no longer using the Citrix Print Client, we still the the CpSVC service because it handles the mapping of printers through Citrix Group Policy. The Citrix Group policy gives us some additional functionality when mapping printers that would be difficult to replicate in normal AD group policy. Anyway, when either of those services crash it breaks everything; Simply setting the service to restart on crash doesn't work either. The processes must be restarted together, otherwise they don't seem to talk to each other.

I've written a small script that reboots both services whenever one fails, which minimizes the impact of a failure, but I'm still working on solving the underlying problem. So it's time for another long rambling post trying to figure out what's happening, the last one went pretty well, so let's give it a shot.

Environment

XenDesktop Controller: Server 2008r2SP1,  XenDesktop 7.0, Physical Server with way more resources than necessary
XenDesktop Hosted Desktop: Server 2008r2SP1, runs RDS server/ XenDesktop 7.1, clients connect from Wyse Xenith2 thin clients, about 100 possible clients, but generally have only 30-50 at any given time.
Print Server: Server 2008R2SP1, Microsoft Print Server

Errors in Event Viewer

Here's a brief rundown of the various errors I've gotten and what I've been able to find out about each.

splwow64.exe crash

Type: Error
Source: Application Error
EventID: 1000
Task Category: (100)
Message:
Faulting application name: splwow64.exe, version: 6.1.7601.17777, time stamp: 0x4f35fbfe
Faulting module name: ntdll.dll, version: 6.1.7601.18247, time stamp: 0x521eaf24
Exception code: 0xc0000374
Fault offset: 0x00000000000c4102
Faulting process id: 0xeb30
Faulting application start time: 0x01cfd83a3b2e4f74
Faulting application path: C:\Windows\splwow64.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll

 splwow64.exe is a process that translates the x64 print drivers for use by 32-bit applications. E.G. All of our print drivers must be 64-bit, because it's running 2008R2SP1, but the server runs 32-bit Office (for various plug-in compatibility). When office wants to print it has to go through splwow64.exe because it wouldn't know what to do with a 64-bit driver.

As for why this crashes, I have no idea. You see the "faulting module" is one "ntdll.dll", and the error code is "0xc0000374". ntdll.dll is explained here in Wikipedia, I'd tried to summarize but since my understanding is vague at best, probably best to read it yourself. "0xc0000374" is an error code that indicates "Heap Corruption", which is a fancy way of saying the memory was modified in a way that wasn't expected. Neither of these bits of information are particularly insightful, but they come up over and over in these errors.


spoolsv.exe crash

Type: Error
Source: Application Error
EventID: 1000
Task Category: (100)
Message: 
Faulting application name: spoolsv.exe, version: 6.1.7601.17777, time stamp: 0x4f35fc1d
Faulting module name: ntdll.dll, version: 6.1.7601.18247, time stamp: 0x521eaf24
Exception code: 0xc0000374 OR 0xc0000005
Fault offset: 0x00000000000c4102
Faulting process id: 0x9044
Faulting application start time: 0x01cfd8232b50fcd7
Faulting application path: C:\Windows\System32\spoolsv.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
This error is very similar to the splwow64.exe one above, with the exception of the "0xc0000005" variant. This, as far as I can tell, is a memory access violation.

Couldn't Load Print Processor

Type: Error
Source: PrintService
EventID: 365
Task Category: Initializing a print Processor
Message:
Windows could not load print processor hpcpp160 because EnumDatatypes failed. Error code 126. Module: 18\hpcpp160.dll. Please obtain and install a new version of the driver from the manufacturer (if available), or choose an alternate driver that works with this print device.

This error is a bit more informative. There are variants for other print processors (e.g. hpzppwn7). hpcpp160 happens to be the HP universal print driver (version 5.8). Anyway it's our first indication of something wrong with the print service. Problem is, most of the time this print processor has no problems. Most of our printers use the HP UPD, and they work most of the time.

I've also tried reinstalling the UPD (on the client, on the server would require more extend downtime). This hasn't had an effect.

Citrix -- "Environment is incorrect" "no printers were found" "printer auto-creation failed"

Type: Error
EventID: 1114 / 1116

I'm not going to list out the full text of these because they are erroneous (at least as far as my problem goes) -- see this forum for more info. Basically these errors can be logged even if printer creation is succeeding. It's very obnoxious. It's possible it's logged because printers are not being deleted at logoff, but I haven't found anything to suggest that is still an issue (that forum post is pretty old).

Print Spooler Can't copy file

note: you must enable the PrintService operation log to see this error. In event viewer find it under Applications and Services > Microsoft > Windows > PrintService. Right-click the operational log, select "enable log"

Type: Error
Source: PrintService
EventID: 811
Task Category: Executing a file operation
Message
 The print spooler failed to move the file C:\Windows\system32\spool\PRTPROCS\x64\hpcpp160.dll to C:\Windows\system32\spool\PRTPROCS\x64\202_hpcpp160.dll, error code 0xb7. See the event user data for context information.

This message again may vary with the print processor (replace hpcpp160.dll with whatever.dll). This is an odd message. The folder it references is full of duplicate print processor dlls (1_hpcpp160.dll to 499_hpcpp160.dll). I have no idea why, this is the current lead I'm working on.

Things I've Tried

  1. Create Script that restarts processes
    1. First did this restarting "spooler" and "cpsvc" every 5 minutes
      1. this technically worked, but caused some strange behavior and is over inelegant
    2. Set "spooler" and "cpnsvc" to run the script when either crashes
      1. this can be done in the services MMC snap in.
      2. Still doesn't solve the underlying problem, but is a nice band-aid fix until I can figure the bigger issue out
      3. it's also way more elegant that the "restart every 5  minutes" solution.
      4. Note: had to change CpSvc to log on as a local service with permission to interact with desktop (was just local service), otherwise the script wouldn't run correctly when it failed.
  2. Moving printers to the HP UPD
    1. Thought here is that one of the device-specific print drivers wasn't terminal compatible
    2. This hasn't exactly panned out. Moved device-specific printers to the UPD but errors continue to show up. I've just finished this migration recently, so maybe it'll pay off over time.
  3. Clearing out system32\spool\prtproc\x64
    1. This folder is full of duplicate .dll files (see "print spooler can't copy file" above) 
    2. Found out not to delete everything from that folder. The WinPrint.dll file will not recreate itself.
    3. Print spooler still crashed (0xc0000005). at like 5am when no one would have been using it. So that's fun.
      1. Actually, someone did log it at 5am, just seconds before the crash. So that's something to go on maybe.
    4. Printing hasn't gotten any worse, so at least I haven't broken anything
    5. This has at least seems to have stopped the 811 errors. Watching to see if the prtproc folder starts to build up again.
  4. Recreating Print Queue
    1. Some things I read suggested that print queues my become corrupt when switching drivers/print processors.
      1. Print queue is the technical term for a shared printer
    2. So I deleted and recreated the print queues that seemed to be causing the most issues
      1. "most issues" was determined by cross referencing the crash time with our user tracking to determine which stations (and thus which printers) were most recently logged into before the crash.
    3. This actually appears to have had some effect. I've only had one crash (and it was the splwow64.exe crash, not the spooler or CpSvc) today.
      1. Today's Friday, so load is low, but will continue to monitor.

Days Pass....
Spooler/CpSvc/splwow64 continue  to crash, but much more infrequently. Average maybe once or twice a day, much lower than the every couple of hours it used to be. I am going to continue to create print queues and see if I can eliminate the crashing all together and will update this page as I learn more.

RestartPrintServices.ps1

write-host "Shutting down Citrix Print Manager"
stop-service -force cpsvc
write-host "Waiting for CpSvc to shut down Gracefully" -nonewline
$count=0

while($(Get-service cpsvc).Status -ne "Stopped" )
{
$count++;

if($Count -gt 5)

{
    write-host ""
    write-host "CpSvc has not shutdown gracefully, shutting down manually"
    stop-process -force -Name cpsvc
    break;
}
write-host "." -nonewline
Start-Sleep 1
}
write-host ""

write-host "Shutting Down Print Spooler"
stop-service -force spooler
write-host "Waiting for Spooler to shut down Gracefully"

$count=0
while($(Get-service spooler).Status -ne "Stopped" )
{
$count++;

if($Count -gt 5)

{
    write-host ""
    write-host "spooler has not shutdown gracefully, shutting down manually"
    stop-process -force -Name spoolsv
    break;
}
write-host "." -nonewline
Start-Sleep 1
}
write-host ""


write-host "Bringing Spooler Back up"

start-service spooler

write-host "Bringing Citrix Print Manager back up"

start-service cpsvc

date >> c:\temp\restartprinters.txt

 


Wednesday, September 10, 2014

Printing issues on Win 7 Virtual machines with XenDesktop -- Citrix Universal Print Client

Solution

The  fix for this ends up being to remove the "Citrix Universal Print Client" from the XenDesktop clients. According to certain sources, this happens when the UPC is attempting to contact the Universal Print Server, even when a Universal Print Driver isn't being used. The server obviously doesn't respond, and so there's a considerable timeout before it falls back to windows printing.

I'm skeptical that this is actually what's happening, at least in my environment, for a couple of reasons.
  1. The timeout occurs even when using the UPS/UPD/UPC.
  2. The timeout occurs for some printers/drivers and not for others

The only way I've found to remove the UPC is to reinstall the XenDesktop Client without the UPC. From command line it looks something like:

#note: reboot before running the first command, and between each command. If you don't specify the "/noreboot" flag, it will reboot automatically after each command.
#<XenDesktopDir> = unzipped iso location
#<XDInstaller> = <XenDesktopDir>\x86\XenDesktop Setup\XenDesktopVdaSetup.exe
#   or for 64-bit = <XenDesktopDir>\x64\XenDesktop Setup\XenDesktopVdaSetup.exe

#Remove Current install
<XDInstaller> /quiet /removeall

#Reinstall Without UPC
#You can check here for the flags you need for any other customizations you make
<XDInstaller> /quiet /Components vda /EXCLUDE "Citrix Universal Print Client" /logpath "c:\temp\xdinstalllogs\"

#Configure Controller hostname/port
#If you set this through group policy, you shouldn't need this step
<XDInstaller> /quiet /reconfigure /controllers "mycontroller.mydomain.com" /portnumber 9999

 

Introduction

Been having some issues with printing from our thin clients. The most common symptom is that whatever program is trying to print (word, notepad, browser, appears to effect all about the same) locks up for 30-60 seconds. During this time the a window saying "connecting to printer" may be present (though not always), and the main program window appears unresponsive (not responding in task manger).

 I'm writing this as I troubleshoot so, apologies if it's a bit schizophrenic.

Setup

Client: Windows 7 x86 - fully updated -- mostly, some are not 100% updated, but all have at least SP1, and it doesn't appear to make much of a difference. Clients have XenDesktop Client installed (7.0/7.1 -- I end up trying 7.5 as well)

HyperVisor: XenServer 6.2.0 (fully updated, SP1 + a couple more updates). Clients all have XenTools installed.

Print Server: Server 2008R2 (VM) x64. Has "Print Manager Plus" software, also has Citrix Universal Print Server.

New Printer Server: Server 2012R2 (VM) x64. Does not have Print Manager Plus.

Drivers: To be discussed

A More Detailed Description of the Problem

My general view of the problem is that the printer is taking a long time to respond to the program trying to print to it. The main way this appears, is when you click "print" in a program, it tries to contact the printer to get availability, status, capabilities, etc. This takes a long time to finish when the problem occurs, which means when you click "print" the program stops working for 30-60 seconds. The window will show "not responding". Sometimes a box saying "connecting to printer" will appear, but not always.

This problem will happen several times when trying to print, because the computer appears to talk out to the printer several times. First when you go to the print menu to select a printer, then again when you click on the printer itself (to select it) then again when you click print to actually send the job. This means there can be a delay of several minutes for a user trying to print a basic document. This is understandably very annoying.

What I've tried so far

My first thought was that it had something to do with the Citrix Universal Print Server. That's why I built up the 2012R2 server. However the problem is present when mapping printers from that server. This only happens with some printers, other printers work just fine.

Mapping the printers directly (via IP on the client) appears to work fine as well.

Most of our printers are HP, and thus most use the HP Universal Print Driver. This is currently my main suspect, but it's unclear as to why that would cause such a problem with only VMs. Physical Clients (your standard desktop machines) do not have this problem.

I also thought it might be the universal print manager (Client side of the Citrix Universal Print Server), but I have disabled that service and deleted the Citrix Universal Print Driver from a machine and the problem persists. It's possible that the Citrix software still causes some problems, currently installing a vanilla windows 7 machine to test this theory.

Problem occurs whether connected over XenDesktop, Remote Desktop, or Directly through XenCenter Console.

Being an Administrator or  not does not appear to have an effect.

Using the FQDN or IP of the server rather than the hostname doesn't appear to have an effect.

I've tried Type 3 and Type 4 drivers (What does that mean?). Both word fine from the print server (that is the print server never has problems printing test pages). Type-4 drivers are not technically supported on Windows 7, so when a windows 7 machine trys to connect to a type-4 printer, they are given a "enhanced Point and Print Compatibility driver". These work fine, however this is not an apples-to-apples comparison, because there is no type-4 HP universal print driver. So the Type-3 not working where the Type-4 does is as much comparing device-specific to Universal as it is type-3 to type-4 (trying to find type-3 device specific drivers). But for what it's worth the HP UPD (type-3) has the problem, the device specific (type-4) do not.

Older version of the HP UPD appear to have the same problem. Nor does PCL 5 vs PCL 6 vs PS.

For some reason HP doesn't always have device-specific drivers on their website -- they'll just like the UPD. So it's really hard to find a printer that has the UPD, Type-3 Device Specific, and Type-4 Device Specific to do some real testing on. Testing the difference between type-3 DS and type-3 UPD at least..... Type-3 DS driver has the problem as well, at least on my "HP LaserJet 400 M401".

Problem also occurs on one of my terminal servers (2008R2),  also running XenDesktop client, but is a physical server.

Problem does not appear to occur on another terminal server (2008R2) which is a virtual machine, but is not running the XenDesktop Client. Doing further testing to verify. For all intents, problem does not exist on this machine. You can see the "connecting to printer..." dialog come up for a split second, but it's almost as fast as a physical machine, not enough to make a difference user experience wise. So it looks like we may be looking at the XenDesktop software as the culprit. It's strange that it only has problems with certain (mostly HP) drivers.

As I mentioned, I've also done tried using the Citrix UPD, but it shows the same problem. I'd have to do more testing to verify whether the CUPD locks up only when using it to print to printers that otherwise exhibit those symptoms, or whether the CUPD is just broken in general.

For now, I'm getting a clean windows 7 machine built up and will install software one-by-one to determine when the problem starts happening.

....

OK, fully updated windows 7 machine with nothing else on it is ready. My method here is
  1. Map the printer
  2. Wait a minute
  3. Open notepad
  4. Try to print
  5. Wait another minute
  6. Open Printer properties
  7. Unmap Printer
  8. Reboot after each full test (between changing variables, not between each printer)
If either 4 or 6 take more than about 10 seconds, I'll consider the problem to exist. I try this with three different printers, all of which have exhibited the problem in the past. Here are the variables and results of the tests.

  1. Clean machine
    1. Problem does not exist
  2. Install XenTools
    1. Problem does not exist
  3. Domain Join
    1. Problem does not exist
  4. Move to Correct OU
    1. Problem does not exist
  5. Install XenDesktop Client (VDA)
    1. Problem defiantly exists.
So I guess that settles it. It's something in the XenDesktop VDA that breaks printing. Keep in mind I did not run any of the XenDesktop Optimizations, so it's not one of those; the problem is within the client itself, or some change it makes without the option not to. So let's see if I can narrow down specifically what's causing it here.

  • This Forum suggests stopping "Net Driver HPZ12" service (some sort of HP monitor thing)
    • Did one better and disabled "Pml Driver HPZ12" as well (another HP monitoring thing)
      • This did not help
    • Lets try disabling the service and rebooting.
      • No Dice
  • This Post on experts exchange says to turn off bidirectional support in the printer properties. Lets try that.
    • It's under the "ports" tab in printer properties. I changed it on the server, then deleted/re-added it to the client.
    • This doesn't appear to have any effect
  • Downloading XenDesktop 7.5 -- just the VDA I'm not upgrading my whole installation yet. Since This "clean" machine isn't even joined to the XenDesktop Controller, I don't see how that would have any effect anyway.
    • Installed 7.5 VDA, doesn't appear to have had any effect.
    • Just to be sure I removed the device and uninstalled all drivers and tried again
      • Still locks up
  • Disabled Citrix Print Manager Service
    • I've tried this before, but thought I'd try again under 7.5
      • Still locks up
  • Disabled "Citrix Personal vDisk" Service
    • At this point I'm just disabling Citrix Services one-by-one to see if there's any change
    • Still locks up
  • Disabled "Citrix Profile Management" Service
    • Still Locks up
  • To reduce redundancy, disabled each "Citrix Service" one-by-one
    • Still Locks up
  • Tried disabling "allow direct connection to printers" in Citrix Policy.
    • This made it slightly less terrible. It still hangs for a bit while "connecting to printer" but the program doesn't stop responding (or at least, windows doesn't think it has). Not a perfect solution but it's progress at least.
  • Found this forum post, trying the solution listed at the bottom - installing the VDA without the universal printing component
    • Completely uninstalled current VDA first
      • Verified problem had gone away
    • Reinstalled using "XenDesktopVdaSetup.exe /components vda /EXCLUDE "Citrix Universal Print Client" /logpath "c:\ctxinstall.log" /quiet /noreboot"
      •  Sweet baby Jesus I think that actually worked
      • Yep, that appears to solve the problem
I've applied this fix to my main two XenDesktop Terminal Servers (aka ServerOS Hosted Desktops) and it appears to have solved the problem. You no longer get the "connecting to printer..." dialog, the program doesn't go to a not responding state, users will hopefully got breaking things by clicking a bunch of buttons while it appears to be frozen.

If the forum post I linked above is to be believed, the issue is that the XenDesktop client attempts to talk to the universal print server, even when the UPD is not being used. I'm a little skeptical that this is the entire problem, because this error would happen even when using the UPD/UPS. But at any rate it's fixed. Obviously this precludes using the Citrix Universal Print Server in the future, but it's honestly been such a pain to manage/get working that I have to call that a 100% positive effect.







    Friday, August 29, 2014

    IMacros for Firefox Failure Corde 0x80500001 (Error code: -1001)

    Solution

    The root issue is the encoding of the files. I've had this problem before with iMarcos, but it's never been quite this specific. Usually saving the datasource (the CSV file) as UTF-8 works. But some update to Firefox or iMacros has made it really inflexible. Both the datasource (.csv) AND the macro file (.iim) file must be saved as "UTF-8 with BOM". I Used Sublime Text to do this, but any full-featured word editor should work.

    For the record, various versions of things I use.
    Firefox : 29.0.1
    iMacros : 8.8.2
    Windows : 8.1 x64

    Problem/Full Story

    I often have to fill out web forms over and over again to perform certain tasks. A lot of these web forms are poorly designed at best, and don't support batch-type inputs. So having a program like iMacros is essential for me not wanting to kill myself while filling out a DHCP registration form 200 times.

    I've used iMacros for a number of years and never had too many problems; In Firefox at least, in Chrome the sandboxing makes it an exercise in keyboard snapping frustration to read/write to files -- but that's another story. However I needed to do a bunch of the previously mentioned DHCP registrations today (this is a system managed by another department, and the web interface is the only way to do it besides submitting a work request, which can take days), and found that the Macro/CSV I had previously used to do this were not working. I received the following error message:

    Error: Component returned failure code: 0x80500001 [nsIConverterInputStream.init], line 4 (Error code: -1001)
    I'd actually run into this error before, or at least one similar to it. iMacros (or possibly firefox), can be rather picky about the encoding it uses. Previously saving the .csv I use for inputs as UTF-8 had solved the problem. Today that didn't work though.

    After fiddling around with it for a bit, I found something strange. I Created a new macro (.iim) file to see if the other one was corrupt or something, but writing/saving it through Sublime Text (not the built in iMacros editor) as UTF-8, then opening it in the iMacros editor just showed a blank file. Strange, after trying a handful of different encodings for the macro file I found one that it would recognize "UTF-8 with BOM". After saving the file with this encoding through sublime, it would show up correctly in iMacros. However, I was still getting the same error when I tried to run it. Tried saving the csv file with the same "UTF-8 with BOM" encoding, and then it ran.

    Thursday, August 28, 2014

    Citrix Receiver for Mac "Cannot start the desktop ... OSStatus -1712"

    Solution

    In my case there were non-responsive processes on the mac client that were causing the problem. To resolve, I closed out of receiver and closed any active desktop connection. I then brought up the activity monitor (command+space to bring up the search, enter "activity monitor"). There were several Citrix processes, one non-responsive process with the name of the personal desktop that wouldn't load and a few helper processes. I force-quit all Citrix processes, then restarted the receiver client. Connected to desktop successfully at that point.

    It may not have been necessary to force-quit all Citrix processes, but it doesn't seem to have had any consequences, they started back up when I reloaded receiver.

    Problem / Full Story

    Had a user this morning that couldn't connect to their windows desktop over XenDesktop (7.1). User is one of our few Mac (running Mavericks) users, and uses XenDesktop to get to windows applications he needs. When he tried to log in this morning, he got the following error when he tried to connect to his windows 7 machine.

    Cannot start the desktop "Personal Desktop"
     Contact your help desk with this information: The application "Personal Desktop" could not be launched because a miscellaneous error occurred. (OSStatus -1712).
    Odd thing was, he was able to connect to his Windows 8 desktop just fine. So the connection to the server was working, as was the connection to at least one VM. The Win7 machine was showing up as registered and ready in Citrix Studio on the XenDesktop Controller. Win7 appeared to be responsive when interacting with it through XenCenter. I tried restarting the Windows 7 machine but the error persisted. A brief look through the longs on the Win7 machine and the XDC didn't show any errors, so it seemed like the problem wasn't server-side. Had the user logout/close receiver on his machine and reopen it, but the error continued to occur.

    Up in the user's office I brought up the activity monitor and saw the unresponsive process -- see "Solution" above. After killing and restarting all citrix processes the user was back up and running. Rebooting the Mac probably would've had a similar effect.
     

    Tuesday, August 12, 2014

    a security package specific error occurred - Security-Kerberos EventID 4

    Solution

    Root problem was that there were static DNS entries set for some computers whose IP addresses had changed. Deleting static entries and waiting for changes to propagate out solved the problem.

    Full Story

    Had an issue this morning where some new computers on our network were not getting printers mapped. This is not an uncommon occurrence, because printers, but the cause of the problem was a new one for me. These computers had just been upgraded (new hardware, same hostnames) and seemed to be functioning fine on the domain. The print driver was working fine on other machines, and the usual fix, restarting the print spooler, had no effect.

    Trying to access the Event Viewer on the lab machines I got the error "A Security Package Specific Error Occurred". This error (or a variation) came up trying to access the computer via any WMI / RPC / DCOM method.

    On the print server I had the following error, listed as Level:Error, Source:Security-Kerberos, Event-ID: 4

    The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server MYLAB-04$. The target name used was cifs/MYLAB-02.My.Domain.Com. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (MY.Domain.Com) is different from the client domain (My.Domain.Com), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

     One thing jumped out here right away, the error is from lab computer 04 (SPN: MYLAB-04$), but the FQDN is listed as computer 02 (cifs/MYLAB-02.My.Domain.Com). So that set off some alarm bells, but I still did some additional research before jumping in.

    Supposedly this error can be caused by a number of things (a Google of "A Security Package Specific Error Occurred" returns about 6 difference causes on the first page of results). In my case, as mentioned above, was a DNS issue. While upgrading these lab machines, the IP addresses we assigned through DHCP changed slightly. Normally, we just let the machines register themselves with the DNS server after they pick up their IP via DHCP, we don't have many static DNS entries. For some reason, these machines had static entries, though, so our DNS server was resolving their hostname differently than AD was, which is what caused the authentication errors. Deleting the static entries and waiting (DNS changes can take a while to replicate) solved the problem.