This is the second in a series of three articles for experienced instructional designers about XML and content reuse systems. The previous article in this series discussed the basics of XML as it applies to learning content management. XML, a derivative language from SGML that represents a simplified and optimized approach to creating databases, is an object language that allows you to create learning objects and transform them into a variety of different forms. This article focuses on many of the methods and benefits of organizing content into repositories, and the tools that can be used to create the content stored there.
Editor’s Note: Parts of this article may not format well on smartphones and smaller mobile devices. We recommend viewing on larger screens.
- Part I: Introduction to XML and Repository Technologies
- Part II: Implementing Content Repositories & Selecting Tools
- Part III: Creating a Unified Content Strategy
Taxonomy
Our scientific understanding of any topic is founded upon taxonomic processes: we take things apart to see how they work. We can gain a better understanding of the intricate parts of a whole system by examining its parts and then combining them together, gradually coming to understand how those parts interrelate.
In a very basic sense, what a content reuse system does is to scientifically divide content into associative, functional, or structural taxons. (See the sidebar on page 14, Definitions.) This taxonomy of information makes useful reuse feasible. The application of this useful taxonomy to enterprise information is what determines whether the content reuse system produces benefits for the organization or becomes just another expensive good idea.
All learning objects are defined by taxonomies. These taxonomies express the way in which each object is understood, used and maintained. In evaluating how to construct learning object models for an XML repository, it is very important to understand that these models are used to define queries. The value of the system depends upon the ease and accuracy of queries. Many organizations discovered too late that they had expended substantial resources in creating an XML (or SGML) repository that provided no additional benefit over cutting and pasting documents from a file server because their content authors could not find anything in the repository.
Repository design
A content repository has several different purposes:
- To store controlled versions of documents
- To store current versions of learning objects
- To store in-work versions of learning objects
- To publish content to Web servers
- To publish content to other servers (Learning Management Systems, abbreviated as LMS)
- To function as an ISO Repository
The most important reason for having a repository is to facilitate collaboration between content creators, editors and production staff. One mistake often made with a complex repository is to make customized views that are not shared between different team members. This can be frustrating and time consuming.
Once everyone has gone through the arduous task of chunking and labeling their legacy content, this content needs to be put into a repository where it can be easily accessed. The best way to do this, for instructional designers, is to put the content into a version control system that is linked to a database.
ClearCase, for example, is a version control application that can present several different views of the repository for different uses. One view presents a virtual file server that contains all the most recent versions of the training documents. Another view presents selected documents to a Web server or LMS. Yet another view presents the XML database elements. Other views can be developed for specific uses, such as creating archives of content, presenting catalogs of approved artwork or source content for other servers such as Adobe Document Server or FrameMaker Server.
The road to XML content reuse is simply a progression of responses faced by learning organizations. Generally speaking, there are six steps taken on the path from no content sharing and reuse to a comprehensive XML repository system:
- File Server — A “shared drive” accessible to all team members with read and write permissions to all.
- Version Control System — A collection of documents, stored by document version to protect against accidentally overwriting files.
- Document Manager — A software system that provides different levels of access to documents based upon selectable criteria.
- Learning Management System (LMS) — A system that provides access to learning content for students, authors, and editors. The modern LMS usually provides some kind of virtual campus paradigm.
- Learning Content Management System (LCMS) — A system that divides up learning content into manageable components, which can be dynamically revised in some or all of its instances in the curriculum.
- XML Repository — A system that applies content taxonomies to organize content into associative and structural classifications so that content can be created and managed with maximum efficiency.
Not every organization progresses through each step in an orderly manner. It is often the case that different groups within a learning organization implement different steps at different times, and then face significant challenges integrating the results. Table 1 summarizes some of the objectives and limitations of each step in the progression.
| Step | Objectives | Limitations |
| File Server | To permit access and sharing of files between many users. | Slow, insecure, and does not scale well |
| Version Control System | To maintain different versions of the same document so that the newest (best) can be identified. | Complex to maintain and difficult to use when additional features are added. |
| Document Manager | To automate more complex features (and rules). | Proprietary — software does not keep pace with new tools and processes. |
| Learning Management System (LMS) | To improve the efficiency of training content delivery and progress tracking. | Can limit designers in terms of format or delivery methods, may not accommodate editing and version control well. |
| Learning Content Management System (LCMS) | To improve the efficiency of training development through content management and reuse. | Often includes a poor user interface; extensive customization required. |
| XML Repository | To provide content reuse, multiple output formats, and extensibility to react to changing needs. | Requires rethinking of the development model by designers. |
When computer networks became common in the workplace, people abandoned the file cabinet for the file server. They soon learned that file servers have their own defects when it comes to sharing important information. The next logical step was to try to remove the most glaring defects of the file server by implementing a version control system. The version control systems made it safer to put your documents onto the network and easier to find things, but when large numbers of people put large numbers of documents into the system, it became harder again.
Then came the document management system which made it simpler to find things, but which usually locked you into tools and processes that rapidly became outmoded. A good example of this last hurdle to progress was a large legal firm that implemented a complex macro-language-driven documentation system that interoperated with their document management system. When the next version of MS-Word arrived, they were very upset to find that there was no backward compatibility — so they remained with the older version of MSWord for ten years.
Learning management systems (LMSs) are primarily student-facing applications. Their purpose is to present training to a student population and to track student performance. Over time, more and more content management facets have been sneaking into these learning delivery platforms. That is not their core function, which is to deliver existing content to students efficiently. Although learning content management systems (LCMSs) are designed to efficiently manage content, they suffer from a lack of flexibility and timeliness.
Everything that is true of document management tools locking you into particular tools and processes is also true of LMS/LCMS deployments, only much more so. Most LCMSs have their own content creation tools, which may be very well intentioned, but which also may fall very short of the functionality and finesse represented by other commercial applications. Of course, most will work with major content generators (more or less), such as MS-Word and Adobe FrameMaker, but they increase the complication of version upgrades by several orders of magnitude. This is a significant expense that must be factored into the cost of ownership and operation of these systems.
The best of the available LCMS systems are blended XML solutions. These systems use XML/XSLT technology as a transformation mechanism, but retain a proprietary data architecture for database functions. In this way, they have many of the advantages of XML technology, such as interoperability, SCORM-compliance, and access to XML enhancements, and they can also customize the database engine to provide better system performance for content management functions. OutStart Evolution®, Aspen®, and learn eXact® are all examples of blended XML systems.
Once you have an XML repository, your repository can inter-operate with other systems, such as LMSs or even LCMSs, but the content is organized for your exclusive needs and convenience. If your needs or tools change, so can the repository. You have created for yourself an “Open Source” solution. For that reason, the XML repository is simpler and less difficult to upgrade than many proprietary solutions.
XML and SGML were developed specifically to provide a structure and methodology for content reuse. Many of the lessons learned from early SGML implementations were built into XML, which provides a more streamlined and less labor intensive means of achieving high quality content reuse.
Question: If XML repositories are so great, then why doesn’t anyone market an XML repository as an LCMS? Answer: Practically all LCMS vendors are organized according to a service consulting business model. They invest massive amounts of time and money to create efficient systems, which they practically give away for free. They do this so that they can sell you customizations, service, training, maintenance, and support. A pure XML repository system could be serviced and maintained by a wide variety of vendors, so they might never earn back the investment they made in creating the solution.
The proprietary product offer does tie the business to the vendor, but it also ties the vendor to the business because the vendor has a huge stake in the outcome of the LCMS implementation.
There are some pure XML repository LCMS solutions that have been developed by the Open Source community (principally by and for academic institutions). They are more like do-it-yourself kits than a fully-developed product offering and do not offer the reliability, features, or performance of commercial off-the-shelf (COTS) solutions.
Reusing content
Legacy content comes in many different forms. Most of these forms represent document instances. Most organizations attempt to maintain a repository of these document instances according to some meaningful hierarchy. ISO documentation standards are an example of this kind of document-centric hierarchy. If documents are correctly named, stored and updated, then the information they contain can be reused, but the process is slow, laborious and susceptible to human error. The utility of simple file sharing is inversely proportional to the number of documents to be shared.
When existing content is chunked, it usually begins in documents that are broken down into component topics and then broken again into smaller pieces identified as introduction, main body, and transitions. Content should sound natural and appear to have been written specifically for each use. Content also is chunked by audience and complexity so that relevant material and more complex discussion can be added or removed easily.
Audience plays a big role in content reuse. Identifying specific blocks of information as appropriate or inappropriate for different audiences can simplify document creation immensely. It also is the hardest classification to accomplish.
For example, consider an Offer Brief: a document that quickly informs sales staff of new offers, pricing and conditions that apply to selling a product or service within a given market. These things are constantly changing. It is a major task to keep this kind of training content accurate and timely. Most of the documents have a similar look and feel. There may be specific types for different audiences or products, but a single item of information may find its way into 30 or 40 different presentations. Along the way it may get a different style — it may appear in a table here and in a paragraph of text there, but the data behind it is identical. It is possible with each new iteration to do a keyword search through a documentation set and locate all known matches, then copy in the revised information. That usually takes too much time and trouble to be worth doing on a regular basis, unless it is very special information.
In comparison, with a properly constituted XML repository, the process is much more direct. Instead of working backwards from finished documents to find the appearance of specific content in context, the source content is already organized according to what it contains. The author goes to that container, revises it, refreshes the repository and the next time the document instance is called, it collects its source content from the updated source, applies the proper formatting, and compiles the finished document. All 30 or 40 documents that touch this same source content are thus automatically updated.
There was more work done in the very beginning to properly analyze and attribute the content, but as the content is used to create more and more instance documents, those documents become progressively less expensive to create, manage and update. It makes it possible to do the previously unthinkable:
- Provide an individualized training syllabus for every employee.
- Implement weekly updates across training syllabi.
- Create monthly updates to training.
- Ensure global identification of misinformation.
- Provide personalized Web-based training tied to employee reviews.
By increasing the efficiency with which content can be created, the quality and timeliness of all the training deliverables can be increased without raising the cost into the stratosphere.
Process
As H. L. Mencken said, “For every human problem, there is a neat, simple solution; and it is always wrong.”
This section describes the development process used to implement the XML content reuse system. Each description includes a discussion of the costs and benefits associated with each process.
Manual reuse systems
In traditional, project-oriented design settings, each new project was a separate entity. Analysis, development, and production were defined by the time line and requirements of each discrete project, and instructional designers produced design and content as an artisan custom-crafting a product for a customer. When this process has worked correctly, it has worked very well. Students receive curriculum that is specifically fashioned to address their needs. Trainers and designers can be student advocates at many different levels. Everybody wins. However, there are some important limitations to this methodology.
It is important to understand that these limitations and disadvantages are not a function of the skills or artistry of the designer. However dedicated and talented a designer might be, if armed only with a typewriter and a mimeograph machine, he will be at a disadvantage compared with someone, of perhaps more pedestrian talents, who is provided with computers and Web-based delivery options.
At the same time, it must be admitted that the best tools will not make a poor designer produce excellent training content. Really good tools have been used to camouflage poor design. It is certainly easier for an incompetent instructional designer to produce much poorer training deliverables with an XML content reuse system than when working alone with MS-Word.
Assuming competent designers, some of the most important limitations and disadvantages of the cottage industry approach to instructional development are:
- Inconsistency — Since every project is independent of every other, it is very difficult to create and enforce standards. Even if templates are used, designers tend to create exceptions.
- Inefficiency — There are many opportunities for reusing content that are missed, either because designers are unaware of legacy content that could be adapted, or because the legacy content is in a format that makes it difficult to adapt to their current project.
- Inaccuracy — Because each project recasts some of the same information in a different way, there is no way to globally update information and reissue training when changes occur.
- Scalability — As workloads increase and staffing levels decline, there is no way to maintain output and quality levels. Designers become frustrated when they’re unable to meet the expectations of their audience.
- Tool Costs — Reliance on outmoded tools, different versions of standard tools, and fringe tools complicates things, and makes people less efficient. The cost of maintaining learning materials sourced in multiple tools is enormous. Standardization on a few tools and methods makes a substantial difference to the production cycle.
XML automated systems
Figure 1 describes a content authoring and delivery system for both online and hard-copy training deliverables. In this example, light blue indicates tools from Adobe, orange indicates tools from Macromedia, yellow indicates tools from Microsoft, and purple indicates tools for open source components or outputs. This is only one of many equivalent solutions.

Figure 1: This content authoring and delivery system produces both online and hard copy training deliverables.
The structured approach to instructional design is seen to have the following benefits, as William and Katherine Horton point out in e-Learning Tools and Technologies:
- The same courses are delivered across multiple media and delivery environments. Just because it happened to be developed by X using Y, this doesn’t stand in the way of it being reused in a completely different environment or with different tools.
- The structured development model supports a consistent instructional design and development process. Designers have many new options that come from an efficient production design.
- XML content can be analyzed and repurposed much more efficiently than legacy content. The content does not hide in a forest of words. When needed, new and legacy content can be efficiently blended to create educational tools to suit different needs of different student audiences.
- Learning content is organized for use. Related content is accessible. Related procedures and policies are obvious — as are conflicts and inconsistencies.
- Because the relationships between concepts and ideas are mapped according to the taxonomy by which the content was chunked, identifying content for reuse and the updating of legacy materials is significantly streamlined.
- Content conforms to Information Technology standards to ensure portability and long-term use. There are three steps in the process of implementing an XML content reuse system: 1) Analysis, 2) Chunking, 3) Operation. The process is very simple, in theory:
- A document type definition (DTD) is selected and tested.
- The repository is created using tables that mirror the DTD.
- Legacy content is converted to XML.
- XML content is placed in the repository.
- Users query the database to construct new documents.
- Users add new content to the repository as needed.
As mentioned before, the initial analysis is perhaps the most difficult stage of the implementation, and it is the one stage that has the most persistent effects. Having once decided upon the one and only way of parsing the content, staff members are carefully trained in how to accomplish the chunking of legacy content into the system.
Legacy content chunking
Whether this chunking process is slow and manual or quick and automated really depends on how much legacy content was created by properly using standardized styles and templates. If practically none of the content was created using standard styles and templates, then there is a great deal of manual evaluation that must be done.
The most important aspect of the chunking process is to have the people doing the chunking UNDERSTAND what they are doing. This is best accomplished by providing them with thorough training, support, and supervision. Consistency is the key. Select a single process, train everyone in that process and execute the process without exception.
NOTE: The importance of thorough and consistent content editing increases by several orders of magnitude when content is entered into the database. Enter it wrong once... use it wrong many times. Michael Hughes says it best:
“Organizations that implement highly configurable or customizable products need to rely on their software vendors to meet the early training needs of the planners and technicians. To the degree that they wish to own or control product configuration, customization, and the ongoing support of those modifications, they also need to be prepared to invest in the staff development required to enable those capabilities.”
There are two approaches to legacy content that are usually successful:
- Identify a small select team of designers who specialize in converting content. They do nothing else until the original body of required content has been put into the database.
- Spread the conversion duties among all the design team. Each member converts documents among their other duties, but at least a fixed minimum number of hours per week.
The advantage of the first method is that you generally obtain a more consistent conversion with fewer errors. The advantage of the second method is that you train your entire group in the XML database and process. You also may learn some things early on that allow you to modify the database or your processes so that they are more applicable to your training.
As with any complex operation, when there are advantages, there are also risks. The risk inherent in the first method is that it may result in a fully functional content base but with no one trained to use it properly. The second method risks creating a database with so many inconsistencies that it is practically useless. The correct method for each organization depends upon the technical background of the team and their workload. Organizations with lower levels of technical proficiency and higher per capita workload generally do better with the first method.
Using chunked content
The theory of developing new documents from legacy components is fairly simple, if the repository is implemented properly. First, the designer needs to know what previous training this new training is similar to. This is accomplished by querying the database and seeing what existing content comes fairly close to the current need. If it is completely new and dissimilar from other training, then the designer gets nothing from the repository but templates. Having made a shrewd guess about some other similar training, the designer has to define how this new training is different from the similar training that has been identified.
One method of handling the query process is by a Web page containing drop down field list properties. Define five or six of these properties and then add in some more specific customizing terms, click SUBMIT and get back a list of matching content. It is just like doing a Web search, except that the Web you are searching is a discrete database. What is returned from the search can take many different forms: FrameMaker documents, raw XML, Word documents or HTML. When the query results in more “hits” than desired, then reformulate it to be more specific. If little or nothing results, then try a more general query until you get the desired results.
The authoring process is iterative, a succession of repetitive operations performed to collect, modify and upload new content. (See Figure 2.)

Figure 2: The authoring process is an iterative cycle.
As time goes by, and the authors and production people get used to using the system to produce the required results, productivity increases and frustration decreases. There will be some people who simply cannot adjust to the new work methods, just as there were some very talented people who could produce marvelous typed documents but who could never quite make a word processor work right.
Some authoring environments, such as Epic Editor, work from the data structure to the content. At the beginning, these tools can be difficult for some designers to understand and use efficiently. After the designers become familiar with the database structure, they rapidly learn to navigate through the maze of information they encounter on cross functional teams to find the parcels they want. In practice, authors working with common, standardized documents rapidly learn the five or six elements they must identify to generate the greater portion of their training. It is more difficult, in the beginning, than cutting and pasting content, but once you get into your stride, it becomes 10 times faster and easier to do your job. Even in a pure XML environment, designers still find the ability to easily query the database invaluable.
It should be noted here that no content management system can stand in for the designer’s knowledge and understanding of the corpus for which training is developed. XML has no real impact upon the analysis or discovery phases of new training development. XML is only a set of tools. Having the skills to manipulate those tools does not in and of itself result in training, any more than reading a manual makes you an expert.
How the content is organized into new instances is a question of authoring tools, not XML.

