Traditionally, filled paper forms are transferred and stored as a kind of all inclusive data unit. Breaking the data out of a paper form means manually copying it to another location. So, it is very common to think of a form and the data entered into the form as a single entity,i.e., that the form is the data. But of course, this is not true. The form is a user interface for collecting the data, and in many cases for presenting the data. In the end, it is the data in which we are interested. The form is just a way to get the data. In practice, this idea of form/data separation is much more evident in electronic forms than it is in paper forms. This is mainly because in electronic forms it is extremely easy to move data into and out of a form.
But now we have to ask the question, "how exactly is data moved in and out of forms?", and "what does the data look like when it is not in the form?". These are the two critical issues when it comes to creating real world form/data solutions: the transfer mechanism and the data format, which are the topics covered here.
For PDF forms in Acrobat/Reader there are many different ways to move the data in a variety of data formats. It is important to have this wide range of options in order to satisfy a wide range of workflows. Any particular solution is heavily dependent on exactly how both the form and the data are used.
Form data, simply put, is a linear set of Name/Value pairs. This is true for form data in all its different incarnations, from the form itself, to the transfer mechanisms, to all the data formats.
Names are the primary way that data is identified, so they need to be unique within their context. On a PDF form, the data name is the name of the form field. When data is extracted to a data format, it is the field name that is used as the name in the format. For example, if PDF form data is extracted to an Excel spreadsheet, then field names become the column names on the spreadsheet. Names pervade the entire data workflow process. It is important to start out with names that make sense so the data can be easily identified and handled later in the process.
There are a couple basic best practices to follow when creating names.
There are of course exceptions to strict field name/data name discussion above. If a PDF script or custom program is being used to transfer data, then that script/program can provide automatic renaming. Some applications like Excel provide name mapping for importing data. An XSLT file can be used to remap XML data. And even in PDF, every field has a submitName property, which would imply that data submission uses this name instead of the field name. Unfortunately the submitName property is only used when submitting data in HTML format, which is a shame. If this property was available for all submission and data export operations then the data name could be decoupled from the field name. But since this is not the case, name your fields carefully.
Within the context of a PDF form, data is typically a value representing a number, text, a date, or true/false. All these types of values are relatively small (in data size) and can be easily converted into a text string. Text representation is important because most data formats and data handling mechanisms are intended to handle text. However, it is possible with a PDF to submit both image and raw file data, which is neither text nor small. These are special cases that require specific types of data formats and data handling, and will be covered separately. Most of the discussion and other articles on this topic will only be about handling typical form data values, for the standard form fields.
PDF form fields are enshrined in the ISO 32000 specification, so all compliant PDF viewers implement them in exactly the same way. But data handling is largely dependent on the PDF viewer and/or tool set used to manipulate the PDF form. PDF viewer/tool vendors are not required to implement JavaScript or data handling in the same ways as Acrobat. However, there are practices, formats, and standards that are common for all data handling, and all the good PDF viewers/tools also follow Adobe's lead in one way or another. So, while the techniques discussed here are specific to Adobe Acrobat/Reader, many may also work/integrate with other applications. Compatibility is not guaranteed or implied.
In most cases the functionality for importing and exporting are complimentary, i.e., data moves in both directions with the same mechanisms. Where Acrobat provides a function/mechanism for exporting data to a particular format, it also provides a function/mechanism for importing from that format. But not all of these functions/mechanisms are equal. For example, anything that touches the user's local file system requires privilege. Also, some are not available to Acrobat Reader without special Reader Extensions, or not at all. When developing a data workflow solution it is important to understand these limitations. Below is a table that lists each mechanism and the associated restrictions. The first two are manual operations the user performs from the Acrobat User Interface, which is why "Privilege is set to Not Applicable", the user always has privilege for manual operations. Each mechanism is discussed in more detail in the associated articles.
Function/Mechanism | Formats | Reader | Requires Trust/Privilege | Note |
---|---|---|---|---|
Drag & Drop data file on PDF | FDF, XFDF | Yes | NA | This is a way to quickly populate a form. |
Menu Import/Export | FDF, XFDF, XML, TXT | No | NA | In Acrobat DC Pro, the data menu items are available in Prepare Form mode on the More. menu. |
Form Submit | FDF, XFDF, XML, XFD, XDP, HTML | Yes | No | Requires Server Script. Submits are always two way. Both import and export at the same time. |
JavaScript Import/Export | FDF, XFDF, XML, XDP, TXT | Requires Rights | With File Path | |
JavaScript Read File | Any Parse-able Text File | Yes | Yes | No Export, functions read raw file data |
JavaScript Data Move | JSON, other simple formats | Yes/No (depends on method) | Yes/No (depends on method) | Move data in/out of JS accessible location, such as global object/document metadata |
IAC (external VB app) | NA | No | NA | Direct programmatic access to fields |
Plug-in | Any | Requires Special Enabling | No | Plug-ins can do anything |
In the standard, and most general form usage model shown in the diagram at the top of this page, an arbitrary, remote user fills out the form and then submits/emails the form/data back to the form owner. The second option in the table above, Submit Form is the only mechanisms built into PDF that handles this case without any assistance from a script or application on the user's system. This mechanism is also implemented by a wide range of PDF viewers.
All of the other export data mechanisms listed in the table above are more suitable for form data automation, or for a closed environment (where the users are known and special applications/scripts can be installed on the user's computer).
Most of the Import functionality in Acrobat parallels the export functions, but there are some interesting and useful variations. For example, the first option in the table is "Drag and Drop". Both the FDF and XFDF data formats are PDF specific formats, so Acrobat immediately recognizes them as form data. This means that they can be dragged and dropped directly onto a file open in Acrobat. Both also contain links to the original PDF form. In most cases, simply opening one of these data files will open the original form and populate it.
Another example is the 5th option in the table "JavaScript Read File". The Acrobat scripting model provides a couple functions that read raw file data. A script can literally open any file on the user's system (or in a file attachment) and parse data out of it. For JavaScript, the ability to parse data is usually limited to plain text files.
There are three main uses for importing data into a form.
It's often the case that data needs to be presented in a different way than how it was collected. To do this with PDF, the form data is exported using any of the standard methods and then imported into a different form that uses the same form field names. The form field names provide the data mapping, from one form into another. There are many variations on this idea, such as using data from several different forms, and custom scripts that perform special data handling when fields have complex configurations and/or don't have the same names as the original data. One popular variation on this is "Variable Data".
Variable data means consecutively loading different data sets into the same form, where a data set could be a row in a spreadsheet or database. Each data set import is saved to a different name, printed, or emailed. This type of operation is used to create form letters, invoices, receipts, and many other types of documents. This technique is also commonly called a "Mail Merge" and there are many 3rd party tools for doing it inside and outside of Acrobat. It can also be done in Acrobat using a custom automation script. Any of the programmed techniques from the table above could be used to created a Variable Data solution.
There are many reasons for pre-filling a subset of fields on a form. As an example, consider an order form, such as the one shown below. This screenshot shows the form open in Acrobat Professional, with the Attachments panel on the left and the Add-ons panel on the right. There are 3 sets of fields on this form that use some type of automatic "pre-fill". Each one is done in a different way, representing the 3 general locations from which data can be imported/acquired.
There are a large variety of locations where data can be exported to and imported from. The general categories of these locations are outlined in the "Pre-Filling Form fields" section above, essentially external to Acrobat/PDF and internal to Acrobat/PDF. The range and flexibility of how these locations can be used depends on the particular mechanism used.
As noted in the "Exporting Data" section above, the last two entries on the table, IAC and plug-in, are both completely custom solutions, so they have the greatest flexibility/capabilities of all the data transfer mechanisms. But, they also cost the most to develop, and solutions using these mechanisms are generally restricted to Acrobat Professional/Standard. Many solutions from 3rd party vendors will use one of these mechanisms.
The manual methodologies at the top of the table are restricted to accessing data files on the user's local file system. However, the local file system could include networked drives as well as remote (virtual) folders that are mapped to the local file system.
The JavaScript model has functionality for accessing the complete range of data storage options (as discussed in the pre-filling example), but any one function/method has limitations, and this is where the discussion is focused.
Not all data formats are equal. Each one has different features that will determine its suitability for a particular solution. The table below provides a brief description of the formats most commonly used with Acrobat and PDF forms.
The second column "Native Format" is marked as "Yes" when there is a JavaScript function for importing and exporting data in this format.
The third column "Data-Sets" indicates the number of individual data-sets that can be stored in the format, 1 or many. For this discussion, a single data set is all the data from a single form. When exporting data with the JavaScript functions, Acrobat creates or overwrites a file with a single data set. It will never add a data set to an existing data file, even when the file format allows for multiple datasets.
Format | Native Acrobat | Data-sets | Description |
---|---|---|---|
FDF | Yes | 1 | FDF (forms data format) is Acrobat specific and not used by other data handling applications, unless they were designed specifically to handle PDF data. Adobe developed it for tight integration with PDF workflows. It not only stores data, but also stores information about the original document and can transfer comment and review data, update content in PDFs, and install and run scripts among other things. If there is a need to transfer raw file and/or image data, then this is the only data format that will handle these tasks automatically through JavaScript. It is however, a difficult format to create and parse outside of Acrobat, so it is most useful for Acrobat-centric workflows. At one time Adobe provided a programming toolkit for handling this format, but unfortunately that is no longer the case. |
XFDF | Yes | 1 | This is the XML version of FDF. Much easier to create and parse, but provides many fewer features. Used mostly for transferring form data and comments. |
TXT | Yes | Many | For data, a text file usually means "Tab Separated Values". This is a common text based format similar to CSV. Each row in the file is a different data set. Recognized by most applications that handle data. This is the only format where the JavaScript function allows data to be imported from any data set in the file. |
CSV | No | Many | Common and very old text based format, where each row is a different dataset. Recognized by most applications that handle data. Acrobat does not provide specific JavaScript functions for handling this format, so it requires custom scripting. However, one of the data export menu items writes to this format for merging data from several forms. CSV is automatically recognized by Excel as a single page in a spreadsheet, so it is an excellent format for importing form data into Excel. |
XML | Yes | 1 | XML is a general purpose, text based data format. Unlike the other data formats discussed above, it is capable of representing complex hierarchical data structures. This format is capable of holding more than one data set, but when specifically selecting "XML" as the format, Acrobat exports a single data set with an ".xml" file extension and uses a simple hierarchical grammar based on field group names. Other formats listed here are XML based, but use a more complex grammar and are saved with a different file extension. |
XFD | Yes | 1 | XML forms format that can also contain data. Created for what became Adobe LiveCycle Forms, now AEM forms. It is a proprietary Adobe XML based form sold into the enterprise market. Looks like PDF on the outside, but isn't PDF on the inside. Acrobat will import/export this format with regular PDF forms, but it's only really useful for AEM forms on the AEM server tools. |
XDP | Yes | 1 | Another XML data/form format for AEM forms. Adobe created this one, primarily as a data format that can contain the original XML form (not PDF form). Acrobat will also import/export this format with regular PDFs, but like XFD it is not very useful in this context. |
JSON | No | Many | JavaScript Object Notation. Quite literally a text string of the JavaScript code for creating a JavaScript Object. Not very efficient with size, but very easy to transmit, store, create and evaluate in JavaScript. Popularized in web browser scripting, it's now used everywhere. In Acrobat the official JSON toolset was added in the DC version. To use this in previous versions use the object.toSource() and eval() functions. The toSource() function does not create strict JSON format, but it works if the JSON is only parsed with the eval() function. |
Excel | No | Many | The Excel file format is proprietary to Microsoft, although the specification is public and anyone can create Excel files. Acrobat does not currently provide any way to interact directly with Excel or Excel files. But, there are three indirect ways to get data into Excel. 1) export to CSV, TXT, or XML and import this file in the Excel app. 2) Write a custom Excel Add-in with VBA that uses the Acrobat IAC to acquire form data. 3) Write an Acrobat plug-in that either writes Excel format directly, or interacts with the Excel app. |
HTML | Yes | 1 | This is the data format in an HTTP Post when an HTML form submits to a web server. It's a very simple name/value pair text format. Acrobat provides the ability to use this format on a form submit so that a PDF form can be submitted to the same server script that would be used for a web form. Unfortunately, the return data needs to be something Acrobat understands. This is where the scheme usually fails because Acrobat does not understand HTML, except to convert it to a PDF. |
Loads form data from a CSV or Tab Delimited (.txt) file into form fields on the current PDF. Complete, ready to use tool, no script editing required. User can search data and manually select the data line to import. Works in Reader.
Copies all fields in one PDF into matching fields in another PDF. Works in Reader.Acrobat can import and export data to/from a Tab Delimited Text file, which is one of the formats recognized by Microsoft Excel. This package demonstrates the process of the import/export process.
This Automation tool uses data from an external XML file to populate fields on a LiveCycle or AcroForm PDF.
A set of scripts that demonstrate using the Global Object, which is used to share and persist data. Tool for testing the HTTP JavaScript function.Collect data from any PDF form into CSV file. New data can be appended at any time and a log is maintained of all files collected. keep reading
How exactly do you create a PDF form with interactive fields? There are basically two steps. Create a Static PDF Form - Use any document creation tool to create the layout and design of your form, th. keep reading
Auto-Populating Form Fields from a Drop-Down List(ComboBox)Scripts and techniques for setting the values of form fields automatically from a selection on a drop-down list. keep reading
This article presents techniques and Scripts for automatically setting the list entries in a drop down (combobox) field from a selection in another drop down field. Includes sample files. keep reading
External data, i.e., data outside of Acrobat or a PDF file, is often a very important part of a workflow process. For example, information on customers, products, employees, etc. are typically stored in Excel files, databases or on a server. One of the most common issues with automating such a workflow process is getting the data from the external file or data source into the automation script. This article provides techniques and script examples for acquiring external data. keep reading
Setting Up an Excel File for Database AccessExcel is probably the most used desktop data tool. And even though it is not a real data base it can be treated as one if it is setup and handled properly. This article covers the specifics of this process as it relates to importing and exporting data from/to a PDF in Acrobat. keep reading
Importing and Exporting Excel DataThis article explains exactly how to transfer data, in both directions, between an excel file and Acrobat. Scripts are provided for importing and exporting in a variety of scenarios, including a looping scenerio for performing variable data operations and mail merge. keep reading
Using Excel™ with Acrobat™, PDF, and LiveCycleThere are several different ways Acrobat, PDF, and LiveCycle forms can exchange data with an Excel Spreadsheet. This series of articles outlines the details of the different methodologies and provides several variations on the code for implementing each. keep reading
These free sample PDF files contain scripts for common, complex, and interesting scripting tasks in Acrobat. Many more are available in the Members Only Download Library. Feel free to browse through. keep reading