Convert Text, PCL, Postscript, other formats to PDF

Mon, 04/30/2018 - 17:01 By Dave Brooks

Introduction

I can still picture the first time I saw PDF. It was in an Adobe booth at a trade show I’ve long forgotten, in the mid-nineties. I was familiar with PostScript, so when the salesperson showed me the “new thing” on paper and on several computer screens (Windows and Mac if I remember correctly), I was mightily impressed. As I should have been.

Fast forward to any year in the past several decades, and PDF is a big deal. Companies need to put documents into PDF, and if the PDF is text searchable, that is far better than not.

This article describes the variety of ways RPM Remote Print Manager® (RPM) creates PDFs from different sources.

PCL to PDF

PCL is the print format generally used by Hewlett Packard printers. I've read that roughly half the printers in the world support PCL.

RPM Elite includes a PCL to PDF transform. This process is easy to set up. We have a video that demonstrates this. See "Quick Start Guide with RPM Remote Print Manager" for a demonstration.

pcl to pdf transform

PCL to PDF transform

The PCL to PDF transform has a wide range of options, although as the video shows, you don't need to specify anything to get a perfectly usable result. There were more capabilities available than we were able to put into a form. If you require more, we can make suggestions for running the tool in a filter transform. That would be the way to expose the maximum functionality.

Regular text to PDF

Regular text or plain text comes with several nuances. Line delimiters can be newline or carriage return/newline. Carriage return by itself often denotes overstrike or bold. The backspace character is sometimes used to "back up" for overstrike or bold.

RPM processes text files and other formats (see the next section) into an internal "text markup" language. A number of our outputs use this "text markup" since it's better to have fewer converters than to have one for text to PDF, another for text to PCL, another for text to GDI print, etc.

Here are typical options for text to text markup:

text markup

Text markup transform

This setup is from one of my test queues. By default, we select the option “Calculate font size and auto-rotate orientation”. That works well for many users. I have tens of thousands of test files, many of which fit the 60 lines and 80 columns mold, so I’ve switched the settings for that.

Naturally, you need to understand your data. This transform is based in part on the requests of many people and seems to work satisfactorily.

Now that we have the text markup well in hand, we can move on to the options for transforming it into PDF.

text markup to PDF

Text markup to PDF transform

Note that the paper type includes paper size. Many of the options in this form are familiar to PDF users.

Regarding the overlays, RPM supports image overlays for PDF and text printing. I happen to have two overlays defined in my installation but am not using either with this PDF transform.

Striping refers to colored horizontal stripes, like the old green bar paper they used on the noisy chain-drive printers from past years. Some users who work with columnar data requested it.

Specialty formats to PDF

RPM supports three formats you might consider to be “specialty”. We have customers who use them, hence RPM supports them.

SCS is a binary format from the IBM AS/400. It used to be a primary print format before IBM developed IPDS, their intelligent print system. SCS is still in use and text markup features are based on the needs of SCS.

ASA and FCFC are text formats that use a print directive in the first column. It might call for overprint, go to the next line, leave a blank line, etc. ASA used to be called Fortran Carriage Control.

RPM translates these formats into text markup. You would follow the transform for SCS, for instance, with a “text markup to PDF” transform to generate PDF. The same applies to ASA and FCFC.

Using a filter program, convert PostScript to PDF

Most people don’t realize this, but the RPM “filter program” is named from an experience I had on a project years before Brooks Internet Software. We had a manual we needed to print, but it was PostScript, and we had a very basic Laserjet 2 printer. I got permission to set up Ghostscript in a Unix "printcap" file. In short order, we had our manual.

In the Unix vernacular a program that transforms output is called a “filter” which is why I thought of it when creating RPM. My point about Ghostscript is that as far as I know, it’s the go-to program for interpreting PostScript and rendering it to a device or another format. The filter action in RPM doesn’t know or care about GhostScript any more than the Unix print system did, or the printer for that matter. There may be other, better programs. I'm not personally familiar with Adobe products, but that might be an option.

The point is, RPM can run a program against your print job. If you have a program that creates PDF from another format, you will use it here. I only mentioned GhostScript because it's well known. I can't say whether it is your best choice, but I do know many of our customers have used it.

Here is the setup for a filter action to run GhostScript to translate PostScript to PDF:

filter action to convert PostScript to PDF

Filter action to convert PostScript to PDF

There are several discussion points here which I feel are relevant. I'll also offer some troubleshooting tips.

  1. There are two executables in the GhostScript bin folder you should consider: gswin64.exe and gswin64c.exe

    Ultimately you should try to end up using gswin64c.exe. This file works without bringing up a command prompt window, and in a perfect world, you don't need a lot of extra command prompt windows running in your system.

    However, as I was troubleshooting this example, I found I needed to turn on interactive processing. That means I selected my user credentials in the "Credentials" list, I clicked "Add User", and I selected "Interact with Desktop". That way I could see the errors GhostScript was showing me until I got the details of the command line correct.

    I also ran gswin64.exe rather than gswin64c.exe while I was doing that troubleshooting.

    If you run gswin64.exe during your processing, I'm sure that is fine. I know less than you can find on a Google search, and there is probably a valid opinion out there.

  2. The command line is adapted from another page on our website, "Using Ghostscript with RPM" This page uses an older version of GhostScript. I found that adding "quit.ps" or "quit" was causing me a runtime error and that leaving it off gave me the desired results. You should try adding the “quit” then not including it to verify which is correct in your environment.

    The final command line I used is:

    -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="%i-%n.pdf" %s

  3. I never use a program folder for the "Working Directory" for any filter setup. That's my advice.

Automated PDF print

I use "print to PDF" on my computer from time to time. Just the other day I printed a bunch of reports to PDF from my health portal. I'm using Windows 10 and Microsoft's "print to PDF" ended up on this computer.

The drawback to using the Microsoft print driver with RPM is that RPM runs in the background, and so the process needs to be automated. Where am I going to enter the filename for the PDF document? Any Windows print driver that generates PDF would need to have that information.

Some years ago I experimented with a program called PDFCreator. You print to it like any other printer definition. I remember it having the ability to use a template for the output filename. For my testing purposes, I used the document name, which is part of the print process (something I would know since I programmed that). I didn't need PDFCreator long term; I was only interested in seeing if I could get it to work with RPM. It seems like it did. I stopped using it and haven't thought about it until one of my colleagues reminded me.

If you have a "print to PDF" solution that works like a normal Windows printer, please keep in mind that your product needs the ability to generate a file path without user intervention.

PDF outputs, or now that I have PDF, what do I do?

RPM supports a wide variety of operations for your PDF files.

  1. You can archive PDFs to disk. We support both local and network drives assuming you provide credentials for the latter.
  2. You can upload to an FTP server.
  3. Some printers support PDF natively so you could potentially use the raw print, the LPR print or the IP print action for those.
  4. We have at least two articles on this site for using one PDF viewer program or another to print the PDF to a Windows printer, using the filter action.
  5. You can also use the filter action to print using GhostScript.
  6. Finally, you can send the PDF as an email attachment.

What about when you want to insert a page at the beginning of your PDF document or append?

RPM has several options for inserting or appending a file to your print job, and we also have several options for banners. Can you do this with a PDF? The answer is, yes but it's not as easy as it is with text files.

PDF files include a dictionary that lists each page in the document as well as the resources the document uses. This dictionary is at the end of the document. The beginning of the PDF file includes a pointer to the location of the dictionary.

If you were to search for "merge PDF files" you would probably find many tools that could do this job for you, and as you have seen above, just set up RPM to run the tool for you. RPM is "tool agnostic" in that we aren't concerned about which tool you use.

However, the PCL to PDF utility we include with RPM Elite has this very feature. Here is how you set it up. 

filter action to merge PDFs

Filter action to merge PDFs

The logic of the command line is as follows.

  • pcltool is the name of the program, and it's found in the "pcl2pdf" folder under the RPM install folder
  • the switch is "-mergepdf"
  • the first argument is a list of files, in quotes, separated by a vertical bar "|". There's not an insert or append, it's just the list of files in the order you want them to appear. I'll demonstrate inserting a file ahead of our job file by making the first argument static and the second argument the filter command job file
  • the second argument is the output file, a PDF

My command line is:

-mergepdf "d:\testing\data\pdf\a2dismod.8.pdf|%s" %u-%f

The first argument includes the string "d:\testing\data\pdf\a2dismod.8.pdf" This is simply a one-page PDF file I happen to have on my system which I used to test this command line.

Note that the job path argument follows this, and the two are separated by the vertical bar "|" as explained above.

Following this, I have %u-%f which translates to my username followed by the filename part of %s, which is the full job path. I merely did this to distinguish the result file from %s. Otherwise, the program would not leave the result file in place when it finished. You might find that "result-%f" would be just as effective. Please see the filter action reference for more possibilities.