The problem setup
We recently heard from a customer using RPM Remote Print Manager ("RPM"). Starting recently with a certain version, it would crash. RPM was not leaving any error message or warning.
RPM logs various operations, but in this instance, the log would unexpectedly stop. In these instances, the final log entry was inconsistent. You would hope the log could show you the last thing you did successfully, or report an error message from a failed operation. We weren't getting even that.
Without logs to help, I studied their setup and found that the user had 12 queues, 11 of which used a "text print" operation. They also had 11 printers set up, one each per queue. The bulk of the jobs used 3 or 4 of these queues over some time.
RPM and text printing
When RPM sends a text print to a Windows printer, it follows the same pattern that every other program on Windows follows:
- Once you identify which printer you want to use, Windows gives you a device context for that printer. The device context means every operation that follows is for that printer until you close the device context.
- Using the device context, you then "start a document." This means I am "creating" a particular document by name on this printer.
- Everything you do after, such as allocating fonts and putting text on the page, is for that printer and that document alone.
- Finally, you "end the document" and "close the printer."
Remember that every program on Windows that prints follows this same logical sequence.
Isolating the problem
Next, I attempted to research known vulnerabilities in the Windows print process. What I discovered is that it was during the start and finish phases the print drivers associated with the Windows print spooler may issue a fault. If the driver issues a fault, Windows might end the process, not merely return an error.
This seemed to be exactly what the customer was experiencing. So, I set up a test on my system. Since I have a fast CPU and a lot of memory, I run 16 print processing threads in RPM, beyond the default 5. I have a text print queue that uses an HP print driver to convert text to PostScript and then sends that result back to RPM rather than a printer.
I have nearly 50 thousand test files I use regularly for text processing. I started sending 2000 at a time into my text print queue. Within an hour, I was able to reproduce the problem the customer experienced.
Why the customer experienced the problem now
With this test, I am subjecting the Windows print system to around 16 concurrent requests to process jobs.
Most of the processing in any print jobs will be text and fonts, but some crossover will inevitably occur where jobs are starting or finishing. It seems the reason the customer had not experienced this problem in the past, is that I had recently streamlined RPM's performance. It suddenly became more likely that the Windows print spooler would be "stressed," leading to the symptoms.
Resolution, part 1: isolation
First, I added a "mutual exclusion" object that would automatically unlock itself when it went out of scope when a block of code was completed. I used this when we started a print job and when we finished, that is the critical sections discussed above. I left the regular processing of text, fonts, pages, etc., alone.
This way the open/close sequences could run one at a time but the regular printing functions could still run concurrently. There is no purpose in having a fast machine capable of running many threads at a time, if everything is serialized.
Second, I made this global to protect sequences in the raw print code that opens a document on a printer. In raw print operations, we don't engage the print driver.
Resolution, part 2: performance hints
I also noticed that my 16 concurrent print jobs would run somewhat slower after 20 minutes or longer.
For a moment, I considered lowering the thread count, but that would affect every operation in RPM. Instead, I changed the "Max Use" setting for the print device. By default, "Max Use" is set to zero. This means that the RPM job scheduler will not take concurrent device use into account. If it is set, the scheduler won't select new jobs that use that device until some current jobs are complete.
In the RPM UI, I changed the usage setting on the printer device to 8. You do this by going to the View menu and selecting Devices. Double-click on a device to get the settings.
I sent a few thousand more jobs and noted that the throughput seemed "crisp" throughout. Ultimately, I settled on 12 as a number that seemed to work for my system and workload.
Your results may vary, but if you would like to keep sustained performance, you might try this.
Note: the reason I suggest this is because the Windows print system is a resource that we don't have many ways to tune.
Final thoughts
First, if you have problems with our RPM product, please contact our technical support.
Second, we work on problems, even if they turn out to be somewhat challenging.