American Printer's mission is to be the most reliable and authoritative source of information on integrating tomorrow's technology with today's management.

Database basics

May 1, 2005 12:00 AM


         Subscribe in NewsGator Online   Subscribe in Bloglines

Prepress

Like a cartoon elephant convulsing upon seeing a mouse, many printing companies panic at the prospect of accepting a customer’s database file. The same companies might routinely use layered Adobe PhotoShop documents or complex Macromedia Freehand illustrations, but quake in their boots when confronted with a few thousand mailing addresses in FoxPro.

Unfortunately, some fraidy cats are missing out on everything from simple addressing jobs to profitable personalized marketing campaigns.

Columns, rows, records and fields (oh my!)
Database refers to any method for organizing and storing information—from a handwritten list in a three-ring binder to the complex infrastructure of management information systems. A discrete collection of information (such as a list of employees and their addresses) is referred to as a data file. This information can be accessed using two common tools: spreadsheets and database programs.

A spreadsheet program (such as Microsoft’s Excel), displays information as a table made up of columns and rows. Each vertical column is intended to store a single type of data (such as "First Name" or "State"), while each row contains all the data accumulated for a single person or subject. These horizontal collections of data describing a single entity are known as records, while the vertical columns of homogenous entries are called fields.

Nearly every data file is based on this table concept and stored within a proprietary format developed by a software manufacturer. Computers running MS Windows, UNIX or Mac OS X identify these formats through a unique filename extension, such as .xls for Excel or .dbf for IBM’s DBASE. In addition to storing data in its own internal format, most database programs also can export data to your choice of several proprietary formats.

Exporting data as plain text is another common feature&#8212each record is separated by a return; individual records are separated ("delimited") by a specific character. Popular text formats for data files include the "comma separated values" format (.csv) and the "tab-delimited" format (.tab or .txt). Some older mainframe computers only can export data as a text file in which each record occupies a consistent number of characters; each field’s data is followed by enough spaces to reach the proper character count.

The XML factor
It seems likely more database systems eventually will move beyond the current table approach to the eXtensible Markup Language (XML) format. Using XML tags to identify the nature of each entry rather than its placement within a particular field and record could improve efficiency of automated processes, including language and currency translation. Furthermore, because XML is an open standard, XML database users as well as content management systems might find it easier to change vendors as their data management needs expand.

Flat field databases only can access one table of data at a time. Relational databases, as shown above, can access multiple tables simultaneously.

Database dilemma: flat field or relational?
Whether you store your data files as XML or in a more traditional format, you’ll likely choose between the two most popular categories of database programs: flat field or relational. Flat field (or non-relational) databases only can access one table of data at a time. There are dozens of flat field programs, most of which are simple list managers intended to create mailing labels or generate a searchable index for your CD collection.

By contrast, a relational database can access multiple tables simultaneously. Of course, nothing prevents you from using a relational database management system (RDBMS) to work with only a single table.

Despite their ability to link multiple tables, most inexpensive relational databases (such as FileMaker Pro or Microsoft Access) typically are used with only a single data file.

Multiple tables or just one big one?
Having opted for a real database program rather than a spreadsheet, another decision lurks: Should you use multiple tables for your project, or keep it simple by compiling all your data into a single table? The answer depends—not surprisingly—upon the task’s complexity.

If you’re just addressing some envelopes or performing a mail merge on some letters, one flat-data file will suffice. Simply accept the customer’s spreadsheet or tab-delimited text file, check for any missing or corrupted data, sort by ZIP code (for a bulk-rate discount) and imprint your documents. You can use any program that can accept a flat file as input—this includes everything from Microsoft Word to sophisticated software for variable-data printing (VDP). (See "Variable-data software update," January 2004.)

Multiple tables linked via an RDBMS should be used for "one-to-many" situations. Consider the classic VDP example, the car dealership campaign. A single flat-data file can tell you where each potential customer lives and even about the most recent purchases from that dealership, but what about any previous transactions? Using two tables—one for contact information and a second to track business activity—lets you achieve a higher level of personalization. You can build a chummy campaign for the customer who’s purchased her last three cars from your dealership, but deliver a more conciliatory message to the poor guy who bought a car that incurred $4,000 in repair costs.

Query Q&A
Suppose your corporate customers are reluctant to give you their records for a VDP campaign? Or, what if they want an ongoing campaign that constantly generates new orders based on the latest additions to the database? You’ll need to "pull" data from a running database, a practice known as a query. Database manufacturers have standardized the format for these requests using the widely supported Structured Query Language (SQL). The SQL query format allows an end user to ask for information based on certain criteria in (somewhat) plain English. High-end database manufacturers have extended SQL with additional terms and functions for more powerful searches.

In most cases, SQL searches facilitate data extraction—but beware the curse of the ancient mainframe! These dusty relics of the pre-DOS days might have significant limitations that require extra attention—data might be stored as all capital letters, punctuation might be non-existent and fields might be truncated after just 10 characters. Marketing is a battle for hearts and minds; don’t underestimate the power of mainframe data to offend your audience. A letter addressed to "H HINDERLITE," for example, will likely end up in the trash.

Cleanliness is next to postal accuracy
Once you’ve obtained the data for your mass mailing or variable-data project, evaluate the records. Common "dirty data" problems include duplicate records, incorrect data, mismatched fields and nonconformance to postal requirements. Regardless of how careful you are, expect some problems. Your customer might have merged several data files together without realizing it created hundreds of duplicate records. Or perhaps the salesperson input prospects’ first and last names into the "Last Name" field.

Cleaning your data before starting the print run isn’t just a good idea, it’s a U.S. Postal Service (USPS) requirement. To qualify for the best bulk postage rates, you’ll need to submit a "USPS Proof of Accuracy" report with your printed pieces. Coding Accuracy Support System (CASS) software automatically generates these reports.

All CASS-certified data cleansing tools share some basic goals:

  • Checking addresses against the USPS master-addressing database to verify street addresses fit within allowable ranges.
  • Checking for known street names.
  • Verifying standard state abbreviations.
  • Looking for bad ZIP codes.
Note that the CASS system merely verifies that a particular address falls within the possible range of house numbers on that particular street; it doesn’t guarantee that John Doe actually lives there or that there’s even a building at that address. This is an important issue—the USPS charges a fee for each piece returned as undeliverable or rejected after delivery.

The majority of "bad" addresses are likely to have only minor errors, so a robust data-cleansing tool will use additional tricks to salvage as many records as possible. For example, 20 percent of Americans move each year, but only 80 percent of them report their new address to the USPS. Your records can be checked against this National Change of Address (NCOA) database to automatically update the mailing address of those who have filed a change of address form.

Other optional USPS verifications include a database known as the Locatable Address Conversion System (LACS), which updates rural addresses into more conventional address formats, and the new Delivery Point Validation (DPV) database that specifies every deliverable street address in the United States. (CASS only specifies allowable ranges.)

The best defense is a good offense
The best approach is to prevent errors from the start. Fortunately, most databases offer a variety of validation rules (good for enforcing "CA" instead of "Calif") and input filters to avoid entering the wrong type of data (useful for mandating a five- or nine-digit ZIP code). In other cases, it makes sense to preprogram a drop-down list of choices (such as a predefined set of business categories) rather than expect everyone to input exactly the same response.

For an industry that loves spectrophotometers, platesetters and automated presses, our collective fear of numbers is unwarranted. Overcome your phobias and you’ll be able to augment those long print runs with short-run, high-margin VDP jobs. Whether you’re cleaning up your own customer list or producing millions of variable-data brochures, the key to success could be mastering your database.



What’s the difference between a database and spreadsheet?
Microsoft Excel often is incorrectly identified as a flat-field database program. While spreadsheet programs like this create data files that can be imported into a database program, key differences include the following:

  • Sorting | Database programs maintain a link between all fields within a single record, allowing you to quickly sort and re-sort the database without fear of mangling your data. In Excel, you must purposely select all the relevant cells within the spreadsheet before you sort, to avoid disastrous consequences.
  • Cell relations | Spreadsheets treat every cell as an unrelated coordinate, making it easy to delete a few cells then move the remaining cells in that row or column over to fill the empty space. But if you delete "Chicago" from the City field of a specific database record, the database program will not offer to move every subsequent City value forward one record to fill in the gap.
  • Indexing | Every database program uses an index to assist in locating data. The index is a specific column of data within the table that must contain a value unique to that specific record. Consumer-oriented database applications create this index automatically, whereas more sophisticated programs allow a choice between selected or automatically generated indexes.

Spreadsheets don’t use an index to enhance data searches, in large part because they load the values for every cell into RAM memory for fast access. This limits the amount of information that can be stored—65,000 rows, in MS Excel. Also, unsaved data might be lost during a power outage or system crash. (Databases typically write to disk after every new operation.)



The best things in life are free
One-to-one marketing companies are heavy users of relational database programs. Did you know that they often augment their customers’ information with their own databases? Because most printing is regional in nature, you should consider it your mission to locate and obtain data that helps describe your city, state or region. Do some research, and you’ll be amazed at the amount of interesting data that can be downloaded at no cost. Access student-to-teacher ratios from countries around the world (www.uis.unesco.org), discover the number of serious crimes committed in every county of your state (www.ojp.usdoj.gov), read detailed summaries of employment statistics broken down by state and county (www.bls.gov/data), and more.



Managing much more than mailings
Progressive Solutions (Santa Clara, CA) is a 21-employee, 15,000-sq.-ft. operation specializing in digital printing, mailing and fulfillment.

Originally founded as a print broker, the 14-year-old company now produces as many as 300 jobs a month on equipment that includes an HP Indigo 3050, Xerox DocuColor 2045 and two Canon ImageRunners.

Managing inventory
Inventory management is a key challenge, according to Progressive Solutions president Mark Sarpa. "If a client orders 400 copies of a publication or brochure, they might have 200 copies sent directly to them and 95 built into assembled kits," explains Sarpa. "The remaining105 are then added to inventory for future distribution. We need to track inventory on all these levels."

For managing everything from inventory to invoicing to production job management, Progressive Solutions relies on elaborate system of about 50 FileMaker Pro database files containing thousands of individual records. "People don’t think of FileMaker as being on par with an Oracle database, but in many ways it’s just as sophisticated," says Sarpa. He says the program’s flexibility is exactly what Progressive needs. "FileMaker is better because it’s customizable," he says. "Our customers’ files tend to be quite varied. We don’t want to ask them to compromise to fit our system."

What’s new
Progressive Solutions, a FileMaker user since 1991, recently installed FileMaker Pro 7. Sarpa says he particularly likes its ability to handle multiple files within a file as well as store documents electronically.

"We’re very excited about being able to handle unlimited tables within a file," says Sarpa. "We can also store and export documents electronically within FileMaker Pro 7 itself via new flexible container fields."

Progressive Solutions previously stored clients’ electronic documents on a series of five servers. Files were difficult to access and finding a specific file often required a lot of searching. According to Sarpa, FileMaker Pro 7 lets users call up documents with a simple doubleclick. "If a client calls with a question, we can give a ten-second answer instead of asking to call back with the information later," says Sarpa.

See www.filemakerpro.com.



Good addresses but bad prospects?
Some addresses are completely correct, yet still not worth your postage. You can suppress these records from your mailing without deleting them from your database. BCC Software’s Mail Manager 2010 has just expanded its list of suppression options, enabling users to block delivery to those who have joined the Direct Marketing Assn.’s "Do Not Mail" list as well as prisons, nursing homes, colleges, trailer parks, the recently deceased and more.

See www.bccsoftware.com.


Contributing editor Hal Hinderliter serves as director of the Graphic Communication Institute at Cal Poly State University (San Luis Obispo, CA). Contact him at hal@halhinderliter.com.