American Printer's mission is to be the most reliable and authoritative source of information on integrating tomorrow's technology with today's management.

Three cheers for XML

Nov 1, 2003 12:00 AM

         Subscribe in NewsGator Online   Subscribe in Bloglines

When you reflect on the events of 1996, you probably don't think about eXtensible Markup Language (XML). But that's when this markup language got its start. XML enables print providers to move further upstream to become an integral part of clients' publishing needs. Indeed, XML is the backbone behind the hottest acronyms in the graphic arts — Job Description Format (JDF) and computer-integrated manufacturing (CIM).

So what is XML? It's a set of defined symbols and rules for explaining the value of data. XML documents contain both content — typically text and graphics, although sound clips, video and other digital resources may also be referenced — and an explanation of what that content means. This information can be expressed in either a document type definition (DTD) or an XML schema — both offer a standardized means of explaining the content.

Many types of computer software, such as a Web browser or a page-layout program, can use these explanations to understand the data. The implications are astounding. XML data is like a movie with subtitles. It allows the author to present his or her original message in the preferred manner, but it also defines the foreign terms for easier interpretation.

Although XML was created by many of the developers responsible for HyperText Markup Language (HTML), there are some key differences. HTML has few applications beyond displaying text and graphics in Web browsers. XML is much more flexible, since it is intended to facilitate the exchange of information between dissimilar software applications.

As an extensible language, XML allows users to create and define their own language components. That's not the case with HTML — The World Wide Web Consortium (3WC) lists every valid command in its HTML specifications. Also, unlike HTML, which contains information defining both content and presentation (formatting information, such as “bold” or “centered”), XML is devoid of all presentation information. Using cascading style sheets is one option for formatting XML data within Web browsers; or, you can convert the XML to HTML before posting it to a website.

Multiple XML efforts, including many related to the JDF initiative, have recently been announced. (JDF communications are carried out via the XML language.) In most cases, incorporating XML within a software program or service allows easier communication with external companies and systems; just as importantly, it allows the protocols for data exchange to be modified and automatically extended. By including metadata describing the meaning of XML content, developers are attempting to automate the delivery of services and information across the Internet.

Today's reality may not match this utopian vision, but there are many different facets to the industry's current XML efforts, including repurposing, multichannel authoring, content-management applications and integration.

The need to convert legacy print documents, such as QuarkXPress or InDesign layouts, into websites has sparked many graphic-arts service providers' interest in XML. Doing so requires only a text editor (for DTD creation), the XML export capability built into both XPress and InDesign, and a utility for performing eXtensible Stylesheet Language Transformations (XSLT). (Windows users can access XSLT conversions from the command line; Mac OS X users should evaluate Marc Liyanage's TestXSLT shareware, available at

Exporting XML for XSLT conversion is somewhat complex, however, requiring some programming skills for creating cascading style sheets and XSL scripts. Fortunately, these processes are only slightly more difficult than writing HTML code for a website, so desktop publishers with one foot on each side of the digital divide can easily learn the skills needed for XML-based repurposing.

While the graphic-arts industry at large is just waking up to the power of XML, one segment of the publishing market has been using XML for years. Instruction manuals, parts catalogs and other types of long-form technical documentation are often created with specialized authoring applications (such as Adobe's FrameMaker) that utilize the Standard Generalized Markup Language (SGML) specification. In the mid-1980s, the U.S. Dept. of Defense mandated SGML for its document publishing so that manuals could be simultaneously output in printed and on-screen formats; consequently, many vendors have added SGML compatibility to their tools. XML has its roots in the SGML specification, and the broad interest in XML has sparked a wave of conversions among document-publishing applications.

Enterprise-level solutions integrate multiple components for authoring, editing, storing and formatting XML content. While the price of these products may be too steep for most printers, corporations often turn to these solutions for high-volume publishing projects. Former FrameMaker developer Datalogics (Chicago) is one example of an SGML composition vendor that now offers a tool for XML publishing (DL Formatter); other leading vendors of large-scale, XML-driven document-authoring solutions include Advent (College Park, MD), Arbortext (Ann Arbor, MI) and XyEnterprise (Reading, MA).

Less complex modular XML authoring and editing products include Soft Quad's Xmetal (recently acquired by Corel) and Altova's (Beverly, MA) xmlspy 2004. Altova offers the Windows-only xmlspy 2004 application in enterprise, professional and home editions. Despite its $49 price tag, the home edition can be used for XSL transformations as well as editing and validation of XML, DTDs and schemas. Corel (Ottawa, Canada), known for its CorelDraw publishing program, has added some features from its WordPerfect and Ventura Publisher applications to its new acquisition, Xmetal 4. Xmetal 4 now offers a customizable interface that can be configured as a word processor or a forms-building application.

Combining an XML editor with an XML publishing system may work well for communicating a variety of digital formats (Web pages, e-books, CD-ROMs, Web-enabled cellphones, etc.), but some companies have found it doesn't provide the desired level of control for sophisticated print projects. Although importing XML data into an InDesign or XPress layout allows typographic manipulation of kerning and tracking, as well as color management and advanced color-separation features, Adobe and Quark's built-in XML import capabilities may not suit every need.

Some printers are combining the capabilities of a page-layout program with the tools of a third-party vendor. “We had no skills in XML when we started, although it's pretty easy to pick up,” says Steve Spink, program manager for Sotheby's (New York City) auction house. After evaluating catalog-publishing systems based on ASCII text and Rich Text Format import, Sotheby's adopted the Atomik Roundtrip XTension from Easypress (London). “We now use XML to go to both print and the Web, enabling us to separate content from the publication process. All the formatting is owned by the publishing group, working in XPress,” notes Spink. “This is a huge step forward, because it gives us a lot more flexibility.”

Salt Lake City's Rastar Digital Media, Inc., wanted to go beyond flowing data into text and picture boxes. “About 90 percent of our print work is variable, so we're always looking for new solutions that provide faster and higher-quality output,” says Rob Drage, Rastar's manager of R&D. Rastar uses Netherlands-based TechnoDesign's XML Impression. XML Impression works in conjunction with TechnoDesign's TCP-IP XT, allowing a Windows server to automate the creation of QuarkXPress pages on a remote Macintosh or PC workstation. More than 300 scripted actions are available, enabling text and picture boxes to be resized on-the-fly during the composition process. “Adopting XML gave us the ability to go straight from the database through XML Impression to the press using an automated workflow, including full typographic controls,” says Drage. “We can also use the same XML data stream to create Web pages using XSLT.”

Exporting the contents of existing print documents creates a single XML file, but dynamic creation of catalogs, directories and Web pages requires the flexibility of a database. Canada's IxiaSoft claims to have created the first commercial XML database, TEXTML Server, which has been deployed by the U.S. Air Force and others. IxiaSoft's first customer, Spanish publishing integrator AiLink, has used TEXTML Server to host its Infopolis editorial content-management system since 1999.

XML repositories extend database functionality to include features such as version control (check-in/check-out) and management of project components, including XML documents, schemas and XSLT style sheets. Interwoven (Sunnyvale, CA), a name already known to many printing and publishing firms for its Mediabin digital-asset-management solution, offers one such XML repository, TeamXML. When features such as access synchronization, workflow control and content deployment are added, the moniker “XML content-management system” is more likely to be invoked.

SiberLogic (Woodbridge, Ontario) allows users to access its SiberSafe XML content-management system through a Web browser, desktop clients or via the WebDAV file-transfer protocol.

Adobe's (San Jose) eXtensible Metadata Platform (XMP) represents Adobe's definition of how it will structure the metadata its applications embed into files. Metadata is often defined as “the data about the data,” which can be used to describe processing instructions, restrictions on usage or a host of other pertinent details. Adobe has made the procedures for creating, embedding and accessing XMP data available to all interested vendors, but only time will tell if other graphic applications will utilize these XML notations. IxiaSoft (Montreal) is an early adopter of the XMP specification — its TEXTML Server can automatically capture and catalog XMP data from Adobe graphic assets.

In addition to XML databases from vendors such as IxiaSoft and SiberLogic, popular enterprise databases are in the process of adding XML compatibility to their list of features. For those who are not yet ready to upgrade their database platform, utilities and hand-coded “fixes” can be used to retrieve XML formatted data. “We run our projects on either Microsoft's SQL Server or Oracle databases, depending on the size of the project,” reports Rastar's Drage. “Since our databases don't directly support XML, our programmers have created a routine to convert exported tab-delimited text into the XML format.” Oracle (Redwood Shores, CA) has recently released its new XML DB database; Software AG's (Darmstadt, Germany) Tamino XML Server is a similar enterprise-level product.

What if you don't have a programmer on staff? According to Frank Kanonik, Xerox (Rochester, NY) customer-training specialist, the latest version of Microsoft Excel can be used for XML conversion. “While it's no substitute for an XML database, Excel XP does make it simple to export flat field databases in XML format,” says Kanonik. “As the popularity of XML-based variable-printing tools becomes more widespread, Microsoft's Word and Excel can be low-cost additions to the printer's toolbox.”

XML enables the automatic passage of data between independent computer systems — eliminating tedious rekeying of data while facilitating the flow of job information from estimating to order-entry systems to production and accounting programs and beyond. The intent behind open specifications such as XML and JDF is to allow workflow components to integrate more easily than ever before. Although some printers will encounter this connectivity as a single-vendor implementation (such as Heidelberg's Prinect suite of tools, Agfa's ApogeeX and Delano, Screen's TrueNet or Dalim Software's Printempo and Mistral), JDF promises to pave the way for multivendor interoperability. But this is easier said than done.

In a recent article for online news portal, consultant Andy Tribute explains that one vendor's definition of “JDF-compliant” may differ from another's. “There's no guarantee that other suppliers will choose the same JDF elements in their implementations. That means while both systems will be JDF-compliant, they may not necessarily interface together successfully.”

Toward that end, two years ago Creo (Billerica, MA) announced Networked Graphic Production (NGP), a strategic initiative whose members are committed “to defining, developing, testing and delivering JDF-based integration between their systems.” To ensure good integration among multivendor equipment, NGP members have agreed to define and use a standardized set of JDF-based interfaces. NGP has more than 25 members, including software (Adobe); MIS (DiMS!, Printcafe, Primac, Prism, Radius and Streamline Solutions); digital print (Xerox); press (KBA, Komori, MAN Roland and Mitsubishi) and postpress vendors (Muller Martini and MBO).

Heidelberg (Kennesaw,GA), for its part, has stated its first priority is to have all of its own prepress, press and postpress products JDF compliant in time for Drupa 2004. Once the vendor has achieved this goal, it will begin working with other vendors on interoperability efforts. (See “Next generation networking,” July 2003.) At Graph Expo, however, Heidelberg announced plans to collaborate with Electronics for Imaging (Foster City, CA) and Printcafe to integrate Printcafe's print-management software to Heidelberg presses “through open JDF connections as specified by the CIP4 organization.” (As part of this agreement, Heidelberg, which offers Prinance, an MIS for small to midsize printers, also will refer larger North American printers to Printcafe's Hagen OA product as appropriate.)

As Drupa approaches, we can expect more vendor JDF alliances and new product announcements. But now is the time to establish your own process automation goals and objectives. Vio Inc. (Roseland, NJ) offers a free booklet, “Process Automation in Printing & Publishing,” that explains the JDF basics and features a checklist for ensuring that JDF-enabled software, systems and equipment meet some basic criteria. To download a copy, visit or e-mail

XML separates content from format, but it has far greater implications for the graphic arts. The motivation for mastering these new tools should be as clear as the industry's declining revenues. Printers can either use XML to become an integral part of clients' publishing needs — or accept a shrinking role in the communication process.

MIS & XML: Ready when you are

Connecting customers to a management information system (MIS) through the exchange of XML data is a hot topic, but not a new concept. “PrintTalk was formed in 2000 to develop a specification for e-commerce,” says Chuck Gehman, Printcafe's (Pittsburgh) director of product marketing and author of “Understanding Information Technology for Print Management.” “PrintTalk published an XML-based e-commerce specification and has become a close partner with the CIP4 group in developing the XML-based JDF specification. PrintTalk's XML-based specification, also referred to as PrintTalk, extends JDF beyond the production facility.”

PrintTalk combines JDF and commercial XML (cXML), an openly available e-business XML format pioneered by Ariba.

According to Gehman, software systems built to these specifications ultimately will allow requisitions, order and job specifications to flow seamlessly among customers, printers, printing equipment and suppliers. “These are open standards, meaning each print-service provider has the freedom to assemble an interoperable workflow, business and e-commerce system from products that can come from competing vendors.”

XML goes online

E-commerce vendor Printable (Solana Beach, CA) has developed solutions for many corporate print buyers using its PrintGateway integration platform, based on PrintTalk and hosted on Microsoft's BizTalk XML server platform. “We're confident we can use XML to integrate with just about any front-end or back-end system that's out there,” noted Joe Fedor, Printable's director of product management. “We've standardized the output from our front-end system using PrintTalk and the BizTalk server,” explains Fedor, “so that every time an integration comes up, it's just a matter of reworking the code from the BizTalk subsystem.”

Pace, Parsec, Prism, Profit Control Systems and other MIS vendors now offer Printable Web-browser estimating modules that use the PrintTalk specification to interface to their estimating systems. (See “Web-based workflow tools,” August 2003.)

Gehman reports that Printcafe uses XML in virtually all of its products. It recently announced PrinterSite Fulfillment, an add-on storefront website and catalog module that uses XML messaging to communicate with the MIS at the printer's facility. “Rather than [using] a general purpose, off-the-shelf interface, we looked closely at customers' and printers' requirements and designed our XML messaging architecture to meet those needs,” says the exec. This architecture is also featured in Printcafe's Prepress and Press Connectors. Based on JDF, these products link Printcafe's MIS with prepress workflow systems such as Prinergy and press consoles to provide a real-time, two-way flow of information.

Gehman, a PrintTalk board member, notes that the industry has been slow to take advantage of the specification's capabilities. “There has to be customer demand for interoperability to make sense,” he says. “Customers aren't banging down the doors asking for PrintTalk connectivity. If there's a customer need, we'll be there with PrintTalk transactions — we have a flexible architecture that will easily let us do that.”

XML: from print to online

Step 1: Stylesheets | Exporting XML from page-layout documents is easiest if you've used paragraph and character styles; it's easier still if the stylesheet names are lower case with no spaces or punctuation.

Step 2: Create (modify) DTD | Quark's avenue.quark tutorial comes with a sample “WhitePaper.dtd.” Using BBEdit, SimpleText or any other text editor, you can modify it to meet your own needs. But every stylesheet from your page-layout document must be defined as an XML element within the document type definition (DTD).

Step 3: Match styles to tags | InDesign can automate this step, while Xpress allows you to set up reuseable “tagging rules.” After you've associated your paragraph and character styles with specific XML elements, you're ready to export.

Step 4: Export XML | This creates a text file, containing your tagged XML data; in many cases, you'll want to embed the DTD inside this XML file for increased portability.

Step 5: Create/modify your cascading stylesheets (CSS) | This will result in an HTML file that uses CSS formatting, so define those styles now. CSS files can be downloaded from the Web. You can also use the “save text as HTML” option in XPress to build a CSS resembling your document's existing format.

Step 6: Create/modify eXtensible stylesheet language (XSL) | Defining your conversion script is one of the most challenging parts of the process; examine the tutorials in TestXSLT for command examples. XSL scripts can insert navigational elements and CSS data, or even break long documents into multiple Web pages.

Step 7: Transform to HTML | Run the XML through your preferred XSLT engine to produce browser-friendly HTML, ready to post on your website or distribute via CD-ROM.