TABLE INTEROPERABILITY

SGML Open Technical Research Paper 9501:1995

Eric Severson

Harvey Bingham

Permission to reproduce parts or all of this information in any form is granted to OASIS members provided that this information by itself is not sold for profit and that OASIS is credited as the author of this information.

To help address the existing interoperability issues when using tabular material (tables) in SGML implementations, SGML Open's Technical Committee formed a Table Interchange subcommittee to research these issues.

Because the CALS table model has proliferated widely, it was chosen as the initial starting point. Although it has evolved to the point of a de facto standard, the specification leaves a large number of semantics open to interpretation which in turn has made interoperability difficult to achieve. As its first major task, the Committee therefore set out to identify and document ambiguities in the CALS table model specifications, identify and document related interoperability issues between SGML Open vendor products, and lay the groundwork for developing a proposed clarification of the standard that will minimize ambiguity and maximize interoperability.

This paper summarizes the results of this initial work, identifies the sources of current interoperability issues for the CALS model, and summarizes the most common set of practices currently followed by SGML Open vendors.

1Background

For the last several years, SGML users have been pointing out that there are major interoperability issues when using tabular material (tables) in SGML implementations. First, SGML itself does not prescribe any standard way of encoding tables, leaving that to individual applications which may have taken different, possibly incompatible approaches. Furthermore, even when an application standard has been defined (such as the U.S. Department of Defense CALS model), different vendors' products may handle the same table in different ways.

Recognizing the importance of this issue, SGML Open's Technical Committee formed a Table Interchange subcommittee in 1994 to research current interoperability issues and recommend changes that would resolve the bulk of these problems. The Committee's mission has involved two fundamental goals:

  • to promote the interoperability of SGML tables throughout the SGML Open vendor base; and

  • to suggest a framework for the next generation of SGML table markup, with a focus on data representation in addition to table geometry and formatting.

Because the CALS table model in particular has proliferated widely, appearing in a large number of other applications, it was chosen as the initial starting point. Although it has evolved to the point of a de facto standard, the CALS table model's design was never actually completed. The specification leaves a large number of semantics open to interpretation which in turn has made interoperability difficult to achieve.

As its first major task, the Committee therefore set out to identify and document ambiguities in the CALS table model specifications, identify and document related interoperability issues between SGML Open vendor products, and lay the groundwork for developing a proposed clarification of the standard that will minimize ambiguity and maximize interoperability.

This paper summarizes the results of this initial work, identifies the sources of current interoperability issues for the CALS model, and summarizes the most common set of practices currently followed by SGML Open vendors.

2A brief introduction to the CALS table model

The present baseline CALS table model was officially released on 26 June 1993 as part of the U.S. Department of Defense SGML standard MIL-M-28001B. First released in 1990 as part of the previous MIL-M-28001A specification, it has now been adopted, sometimes with small modifications, in a large variety of non-military industry applications. These include commercial aerospace (ATA and AECMA), computer documentation (DocBook), automotive (J2008), semiconductors (Pinnacles), telecommunications, and many site-specific uses. Even within the Department of Defense, several versions have evolved (e.g., 38784C vs. 38784D vs. Navy MID, etc.).

Designed to handle a variety of complex tables in military technical documents, the CALS table model focuses on encoding two-dimensional row and column geometry with basic formatting features such as cell alignment, borders, and rotation. It anticipates complex cell content, such as multiple paragraphs, lists, and graphics, and even provides for one level of nested tables within cells. Other than providing a facility for logically naming columns (e.g., state and population rather than simply column 1 and column 2), it does not address semantic encoding of the content. Structurally, the CALS table model presumes that tables are made up of an optional title plus one or more TGROUPs, each of which has its own body (TBODY) and its own independent set of optional column headings (THEAD) and footings (TFOOT). The column definitions for each TGROUP are provided by COLSPEC elements, which can be inferred if desired, and special COLSPECs can be provided for THEAD and TFOOT.

Each THEAD, TBODY, and TFOOT section within a TGROUP is made up of a series of ROWs. ROWs are formed as a left-to-right sequence of cells (ENTRYs), which can contain mixtures of text, graphics, and more complex structural objects such as list items. An ENTRY may span more than one column (horizontal spanning) and more than one row (vertical spanning), depending on how its attributes are set. As a shortcut for defining horizontal spans, optional SPANSPEC elements can be added at the TGROUP level, each referring to spans across column names defined in COLSPECs.

A table cell can contain either a simple ENTRY or an ENTRYTBL, which is essentially a table-within-a-table. ENTRYTBLs cannot contain TGROUPs (they are implicitly a TGROUP themselves), and cells within an ENTRYTBL cannot contain other ENTRYTBLs. Also, although ENTRYTBLs may span more than one column, they are constrained to one row.

Cell formatting attributes are limited to text alignment (both horizontal and vertical), cell borders (rulings), and rotation. For all but the last of these there is a complex inheritance scheme that allows values to be defaulted and overridden at multiple levels.

3The dimensions of tables interoperability

Interoperability potentially covers anything that would surprise a user when moving tables between different SGML systems. In essence, it has to do with answering the key question of the frustrated user: If I spent time and energy getting this table right on one SGML system, why doesn't it automatically work on another?

However, interoperability is not as straightforward as one might think. In fact, it is to some extent in the eyes of the beholder. Interoperability can be defined (and is defined by users) in at least two different ways:

  • Preservation of SGML syntax (exact set of SGML tags and attributes must not change as a table is passed through multiple systems in sequence—i.e., imported and then re-exported at each step).

  • Preservation of format (visual appearance of the table must not change as table is passed through multiple systems in sequence—i.e., imported and then re-exported at each step).

In the first, data-centric view, detailed differences in format are tolerated when rendering across different systems. However, it is essential that original SPANSPEC definitions, for example, are faithfully carried through from input to output. It is also essential to maintain and allow edits to data (e.g., TFOOT) that is not supported for rendering purposes. Put another way, users are not surprised if their tables don't always look the same, but are upset if any of the detailed tags or attributes change when passed through multiple systems.

In the second, appearance-centric view, detailed differences in SGML syntax are tolerated when interchanging tables between different systems. It doesn't matter, for example, that original SPANSPECs are transformed into individual spanned cells, with a new set of SPANSPECs generated upon export. What is essential is that the table continues to look the same. Put another way, users are not surprised if detailed tags or attributes change when passed through multiple systems, but are upset if their tables don't always look the same.

Ideally we would achieve both kinds of interoperability simultaneously. However, in practice part of the interoperability problem comes from the difference in these two viewpoints, and from not understanding these differences. As vendors we must have a very clear definition that can be readily understood by users. But users must also understand the limits of any definition, realizing that interoperability is partly implementation and partly education.

Interoperability must also be carefully defined at a specific level. For example, one user might conclude format has been preserved if the all cells maintain the same left, right, top, and bottom borders. Another might disagree because line widths are different between two systems. This becomes especially tricky when the difference between page-oriented and pageless systems is considered. Looking the same is not an intuitively obvious judgment.

Of course, the keys to interoperability must still be addressed through specific features and functional behavior of the products themselves. Thus another cut at interoperability, one which we have adopted here, breaks the issues down into three implementation-oriented categories:

  1. Differences in supported features (e.g., Vendor A supports ENTRYTBL, Vendor B does not; therefore tables have to be handled differently when moving data between systems).

  2. Differences in interpretation of supported features (e.g., both Vendor A and Vendor B support COLSEP, but Vendor A interprets it as the left separator, Vendor B interprets it as the right; therefore tables look different when moving data between systems). These differences may come in several forms:

    1. Differences in the specific semantic attached to a tag or attribute (e.g., does FRAME override the borders of outside cells, or does it constitute an entirely separate frame around the whole table?).

    2. Differences in the assumed defaults where none are specified (e.g., does #IMPLIED all the way to the top level mean that cell rulings are assumed on or off?).

    3. Differences in inheritance path/precedence order (e.g., does formatting on SPANSPEC override that of COLSPEC or vice versa?).

    4. Mismatches when dealing with redundant information (e.g., Vendor A uses COLNAME and ignores NAMEST, but Vendor B uses NAMEST and ignores COLNAME).

  3. Differences in definition and handling of error conditions (e.g., when defined cell alignment conflicts between the ENTRY and an applied SPANSPEC, Vendor A arbitrarily uses the SPANSPEC definition, Vendor B identifies an ambiguity and rejects the table; therefore an acceptable table on one system may be rejected by another).

In a perfect world, competent vendors acting in good faith would avoid any such differences between their products. However, Einstein's God is in the details applies here with a vengeance. Two products may easily seem to support the same feature set in the same manner; for example, each might support the use of COLSPECs and allow the number of COLSPECs to differ from the defined number of COLS. But upon deeper inquiry, it can turn out that one assumes the actual number of columns is defined by COLS, inferring COLSPECs if needed and ignoring any excess COLSPECs, while the other uses the number of COLSPECs to determine the actual number of columns, resetting COLS to match. Thus, even though things seem to match up at a high level, in fact a subtle but potentially quite serious interoperability problem exists.

Finally, in analyzing interoperability it is important to understand what the underlying model was actually intended to do. The CALS table model, for example, was purposely designed to deal only with the structural aspects of tabular information, together with basic presentation choices (e.g., rulings around cells, alignment, etc.). It was not meant to dictate precise rules for typesetting, composition or screen display, or the complex interactions between table and page layout. Nor does it specify how to handle errors. If we try to measure interoperability at a higher level of precision, we have already gone beyond the scope of the CALS table model itself.

3.1Our definition of interoperability

We have chosen to define interoperability in terms of the ability to support an agreed-upon exchange feature set using an agreed-upon, unambiguous set of semantics. We assume that standards for identifying and processing error conditions must be included as part of the agreed-upon semantics.

We believe this provides a pragmatic working definition that vendors and users can both understand objectively. While not absolutely comprehensive, we think it will be sufficient to prevent any significant differences in the look and behavior of CALS tables when moving between SGML Open vendor products. Of course, because of the complex nature of the interoperability problem, it is also important to note that some differences will inevitably continue to exist between individual products. Success must be measured in terms of removing all significant issues, not in achieving absolute perfection.

4Methodology for the study

After a brief review of the CALS standard and current vendor products, the Committee recognized that the interoperability issue was in fact made up of a large number of subtle ambiguities and small differences in interpretation. Taken individually, each problem was minor and could easily escape notice. When added together, however, they created an overall issue with major implications.

Because the essence of the problem was lurking in the details, the Committee felt it was imperative to take a very thorough, rigorous approach. Therefore we began by combing through the existing CALS table model and related tag/attribute definitions, attempting to identify every possible area of potential ambiguity or misunderstanding. From this baseline, we then created a highly detailed vendor questionnaire, consisting of over one hundred questions designed to pinpoint all possible areas of difference between products. These were in turn broken down into eight major categories:

  1. Backbone table structure/format.

  2. Row/column structure.

  3. Cell formatting - Column widths.

  4. Cell formatting - Rotation and alignment.

  5. Cell formatting - Cell borders.

  6. Cell formatting - Inheritance.

  7. Cell content.

  8. General.

Individual questions focused on both differences in the set of supported features across vendor products, and in the way each vendor had interpreted the semantic details. The general questions were designed to elicit comments that might shed additional light on the issues from slightly different angles. Furthermore, to minimize the possibility of misunderstanding, we encouraged participants to attach comments of unlimited length to each response. A complete list of survey questions is contained in Appendix A, Survey questions.

All SGML Open vendors were invited to participate in filling out the questionnaire, with the goal of obtaining a large enough representative sample on which to base solid conclusions. In addition, we solicited input from other past and current members of the CALS technical committee that architected and now maintains the CALS table model. After this process was complete, we had obtained a wealth of information, including completed questionnaires for seven SGML Open vendors that provide authoring, publishing, and electronic viewing products that support CALS SGML tables. While a few vendors elected not to participate for various reasons, a vast majority of the largest and most experienced SGML vendors were included, and virtually all SGML Open vendors offering related products were members of our committee. Therefore we felt confident that our sample of seven vendors was a sufficient base for analysis.

After a few iterations in which vendors were allowed to obtain clarifications and refine their answers, results were formally tabulated in matrices and cross-analyzed to extract the key issues. See the detailed survey results in Appendix B, Detailed analysis of survey results. In the initial extraction process, we identified significant issues on the following basis:

  • If a specific feature appeared to be fully supported by less than two-thirds of our representative sample (i.e., by four or less vendors), this feature was identified as a significant issue.

  • If a specific semantic interpretation was not unanimously agreed upon, it was also identified as a significant issue.

A summary of the most commonly supported features was then constructed by excluding those features that had been identified as significant issues.

Similarly, a list of key ambiguities/differences was drawn up consisting of all specific semantics that had been identified as significant issues. A summary of most commonly supported semantic interpretations was formulated using the interpretation shared by the largest number of vendors in our sample.

5Summary of findings

Vendors agreed on the great bulk of table features and semantic interpretations included in the survey. As expected, a number of detailed differences surfaced between the products surveyed. However, most of these were relatively subtle. Following our working definition of tables interoperability, we separated these into two fundamental categories, summarized below:

  1. Differences in the specific set of features supported (e.g., some vendors support ENTRYTBL, some do not.

  2. Differences in interpreting tag/attribute semantics (e.g., in the absence of a strict definition, vendors have made slightly different assumptions about how cell format attributes are inherited and overridden).

A summary of these results in matrix form can be found in Appendix B, Detailed analysis of survey results.

5.1Unsupported Features

Our analysis shows the following features are generally unsupported:

  1. Backbone Table Structure/Format: TFOOT, ORIENT attribute, PGWIDE attribute, TABSTYLE/TGROUPSTYLE attributes.

  2. Row/Column Structure: Preservation of SPANSPEC/COLNAMEs from import through export, separate COLSPECs at the THEAD and TFOOT level, ability to redefine columns within THEAD and TFOOT.

  3. Cell Formatting - Column Widths: decimal values for proportional COLWIDTHs, support for mixed COLWIDTHs (proportional plus fixed measure within same column).

  4. Cell Formatting - Rotation and Alignment: ROTATE attribute, CHAROFF attribute.

  5. Cell Formatting - Cell Borders: None.

  6. Cell Formatting - Inheritance: Preservation of inheritance structure from import through export, inheritance of cell alignment/cell borders from SPANSPEC.

  7. Cell Content: ENTRYTBL, and when supported multiple TGROUP equivalents within ENTRYTBL.

5.2Differences in Interpretation

Our analysis shows there are differences in interpretation in the following areas:

  1. Backbone Table Structure/Format: FRAME as override for outer cell borders, default for FRAME attribute.

  2. Row/Column Structure: Whether number of columns is determined by COLS attribute or set of COLSPECs, interpretation when these are different, ability for COLSPECs/ENTRYs to be in non-sequential (i.e., other than strictly increasing) order, whether gaps are allowed between ENTRYs, use of SPANSPECs in THEAD/TFOOT, precedence of COLNAME/NAMEST/NAMEEND/SPANSPEC, error handling for logical inconsistencies.

  3. Cell Formatting - Column Widths: list of allowed fixed unit values, default fixed unit, default method for determining COLWIDTH.

  4. Cell Formatting - Rotation and Alignment: default for ALIGN.

  5. Cell Formatting - Cell Borders: defaults for COLSEP/ROWSEP, interpretation of FRAME attribute.

  6. Cell Formatting - Inheritance: Precedence order of inheritance paths for ALIGN, VALIGN, ROWSEP and COLSEP, precedence order using COLSPECs within THEAD/TFOOT.

  7. Cell Content: Processing of ENTRYTBLs.

6Summary of most common practices

As a means of laying the groundwork for establishing table interoperability between vendor products, this section summarizes the practices and commonly supported interpretations by SGML Open vendors. These include:

  • most commonly supported features;

  • features not commonly supported;

  • most common semantic interpretations.

6.1Summary of most commonly supported features

The following represents the set of features most commonly supported by SGML Open vendors at the time of our survey. Please note that not all vendors support each of these features in every case (see the detailed survey results in Appendix B, Detailed analysis of survey results.)

6.1.1Backbone Table Structure/Format

  1. Support for multiple TGROUPs within the same table.

  2. Use of THEAD with auto-replication across pages.

  3. TBODY element.

  4. Use of the FRAME attribute on TABLE (all defined values: TOP, BOTTOM, TOPBOT, ALL, SIDES, NONE).

  5. Use of the COLS attribute on the TGROUP element.

6.1.2Row/Column Structure

  1. Use of COLSPEC to define column specifications.

  2. Horizontal spans by using SPANSPECs referenced by SPANNAME, and alternatively by explicit NAMEST/NAMEEND attributes on individual cells.

  3. Vertical spans using MOREROWS on individual cells or horizontally spanned cells.

6.1.3Cell Formatting

  1. Either fixed or proportional COLWIDTHs for individual columns (e.g., column 1 has width 1* and column 2 has width 1in.

  2. Decimal values for fixed COLWIDTHs.

  3. Support for ALIGN attribute (all defined values: LEFT, RIGHT, CENTER, JUSTIFY, CHAR).

  4. Support for CHAR attribute to specify value for CHAR alignment.

  5. Support for VALIGN attribute (all defined values: TOP, MIDDLE, BOTTOM).

  6. Support for ROWSEP and COLSEP attributes.

  7. Full inheritance of cell format attributes other than through SPANSPEC.

6.1.4Cell Content

  1. Pure text within table cells, including multiple structured textual objects (e.g., lists, warnings, etc.).

  2. Single notation object (e.g., equation or graphic), not mixed with text, within table cells.

  3. Mixed text and graphics in table cells, but with no explicit relative positioning among graphics and text (as the size of textual material presentation is system-dependent).

6.2Features not commonly supported

The following features were not found to be commonly supported at the time of our survey.

6.2.1Backbone Table Structure/Format

  1. Use of the TFOOT element.

  2. Use of ORIENT, PGWIDE, and TABSTYLE attributes on the TABLE element.

  3. Use of the TGROUPSTYLE element on the TGROUP element.

6.2.2Row/Column Structure

  1. Preservation of SPANSPEC/COLNAMEs from import through export (i.e., preservation of precisely the same set of named COLSPECs and SPANSPECs, as opposed to exporting a format-equivalent but different set).

  2. Separate COLSPECs at the THEAD and TFOOT level (allowing redefinition of columns within THEAD and TFOOT).

  3. Ability for COLSPEC and ENTRY elements to be in non-sequential (i.e., other than strictly increasing) order.

  4. Ability for gaps to occur between ENTRYs in a table row (e.g., a situation in which explicit ENTRY elements are included for columns 1 and 3, thereby forcing an ENTRY element for column 2 to be inferred).

6.2.3Cell Formatting

  1. Use of decimal values for proportional COLWIDTHs.

  2. Support for mixed COLWIDTHs (proportional plus fixed measure within same column).

  3. Use of ROTATE and CHAROFF attributes.

  4. Use of SPANSPECs to specify or override cell formatting attributes.

  5. Preservation of inheritance structure from import through export (i.e., preservation of precisely the same set of attribute values on the same elements throughout the table structure, as opposed to exporting a format-equivalent but different set).

  6. Inheritance of cell format attributes through SPANSPEC.

6.2.4Cell Content

  1. Preservation of precise positioning of mixed text and graphics elements within table cells.

  2. Use of ENTRYTBL element.

6.3Summary of most common semantic interpretations

The following represents a consolidated list of semantic interpretations most commonly followed by SGML Open vendors. Please note that not all vendor products currently implement all of these interpretations in each case. (See the detailed survey results in Appendix B, Detailed analysis of survey results.)

6.3.1Backbone Table Structure/Format

  1. The FRAME attribute is interpreted as determining the outer cell borders for the top, bottom, left, and right sides of the table. Specifically, FRAME is the only way to cause to appear or not the left and top outer cell borders for the table, and override settings at lower levels for outer cell borders on the bottom and right sides of the table. It is not interpreted as specifying a separate frame outside the boundaries of the defined cells.

  2. If not provided by a style specification, the #IMPLIED value for the FRAME attribute (i.e., at the TABLE level) is interpreted as ALL.

  3. The ORIENT attribute LAND value is interpreted as 90 degrees counterclockwise from the current page orientation. The LAND value only rotates the table content, not the page header/footer or other text outside the table element.

  4. If not provided by a style specification, the #IMPLIED value for the ORIENT attribute (i.e., at the TABLE level) is interpreted as PORT (no relative rotation).

  5. The PGWIDE attribute 1 (yes) value is interpreted as defining a table presentation that spans multiple text columns to fill the entire width of the page. If ORIENT is set to PORT, a value of 0 (no) is interpreted as defining a table presentation that spans only the width of the current text column. If ORIENT is set to LAND then PGWIDE is not meaningful.

  6. If not provided by a style specification, the #IMPLIED value for the PGWIDE attribute (i.e., at the TABLE level) is interpreted as 1 (width of page).

  7. The TABSTYLE attribute (on TABLE) and the TGROUPSTYLE attribute (on TGROUP) are interpreted respectively as providing a hook to apply style information for either the table as a whole or a specific TGROUP.

6.3.2Row/Column Structure

  1. The number of columns is determined by the COLS attribute on the TGROUP element, not by the number of COLSPECS actually defined. If the number of COLS is larger than the number of COLSPECs, then additional COLSPECs of length 1* are inferred (all other attributes left as #IMPLIED). COLSPECs can be numbered or unnumbered, and are in sequential order. Unnumbered or inferred COLSPECs are implicitly numbered incrementally (one more than the previous column number), with the first COLSPEC starting at 1. Any logical inconsistency in COLSPEC numbering, or a number of COLSPECs greater than COLS, is interpreted as an error.

  2. If not provided by a style specification, the #IMPLIED value for the COLWIDTH attribute in explicit COLSPECs is interpreted as proportional value 1*.

  3. ENTRYs can be tied to a specific column, or have column inferred. In any case, they are in strictly increasing column order for a row. Unnumbered ENTRYs are interpreted as being numbered incrementally (one more than the column number of the last previously used column in the row). Note that columns may already be in use due to the completion of a horizontal span from an entry earlier in this row or a MOREROWS=N on an entry in a prior row. The first ENTRY of a row starts in the first unused column of this row (usually column 1 unless it is already in use due to a vertical span from a prior row). Any logical inconsistency in column numbering or a number of ENTRYs greater than the number of columns, less number of columns in a row consumed by MOREROWS=N encroachment from a prior row, less the sum over all spans in the row of the number of columns less 1 in each span, is interpreted as an error.

  4. A row into which a vertical straddle occurs because of a MOREROWS=N on an entry from a prior row has no entry in that column. Therefore an entry without a specific column falls into the next non-vertically straddled column.

  5. When determining the column placement of any given ENTRY or ENTRYTBL, a horizontal span always takes precedence over a single column specification. For example, if both SPANNAME and COLNAME are specified, the SPANNAME takes precedence.

  6. If a horizontal span is desired, this may be accomplished either by using SPANNAME or an explicit NAMEST and NAMEEND without specifying SPANNAME. If both SPANNAME and NAMEST/NAMEEND are specified, NAMEST/NAMEEND are ignored. Note, there is no consensus that COLNAME can be used as part of specifying a span.

  7. If neither SPANNAME nor an explicit NAMEST/NAMEEND pair are specified, then the ENTRY or ENTRYTBL is placed within a single column. In this case, if NAMEST is specified it takes precedence over any COLNAME value. If NAMEST is not present, COLNAME is used. If neither is specified, the entry goes into the next available column in the row.

  8. COLSPEC and SPANSPEC names are not allowed to overlap (i.e., they share the same name space within each TGROUP or ENTRYTBL).

6.3.3Cell Formatting

  1. The standard list of allowed fixed unit values for COLWIDTH is pt (points), cm (centimeters), mm (millimeters), pi (picas), and in (inches). The default fixed unit is interpreted as pt if a unit is not specified.

  2. If not otherwise specified, COLWIDTH is always assumed to be proportional with value 1*.

  3. Cell formatting defined explicitly through attributes on ENTRY elements takes precedence over any other specification. However, for list or other paragraph styles, the align formatting is consistent with those styles for those elements occurring outside the table, but constrained to a presentation area that is the entry width (unless the style of such elements is explicitly determined otherwise by a style specification).

    If not defined explicitly on ENTRY, a formatting attribute is determined by following an inheritance scheme going from higher to lower levels (as defined here). In this case, the lowest level at which the attribute has been defined takes precedence over the value derived from higher levels (e.g., ALIGN defined on COLSPEC or SPANSPEC overrides any default defined on TGROUP). If an attribute is never explicitly defined, and the CALS table model does not provide an explicit default value at the highest level, then the attribute's value is defaulted from an accompanying style specification if possible. If there is no such specification or default provided, an agreed-upon default is assumed as specified herein.

  4. COLSPEC and SPANSPEC attribute values are themselves defaulted in the same manner from higher levels. Inferred COLSPECs are assumed to default all their formatting attributes from higher levels.

  5. When applying the inheritance schemes, an ENTRY inherits its formatting attributes through the appropriate COLSPEC if either NAMEST or COLNAME is used to determine the column, and through the appropriate SPANSPEC if SPANNAME is used to determine the column (and span).

    If inheritance is through the SPANSPEC, then the SPANSPEC itself inherits its default values through the COLSPEC corresponding to NAMEST (the first column of the span).

    Similarly, if an ENTRY uses NAMEST and NAMEEND to specify a horizontal span (as opposed to SPANNAME), then the ENTRY inherits its formatting attributes through the COLSPEC corresponding to NAMEST (the first column of the span).

  6. The inheritance path for ALIGN, CHAR, and CHAROFF is ENTRY<COLSPEC/SPANSPEC<TGROUP.

  7. The inheritance path for VALIGN is ENTRY<ROW<THEAD/TBODY/TFOOT.

  8. The inheritance path for ROWSEP is ENTRY<ROW<COLSPEC/SPANSPEC<TGROUP<TABLE.

  9. The inheritance path for COLSEP is ENTRY<COLSPEC/SPANSPEC<TGROUP<TABLE.

  10. The #IMPLIED value for the ALIGN attribute (i.e., at the TGROUP level) is interpreted as LEFT.

  11. A structured text element (e.g., list item) within an ENTRY gets its relative alignment directly from that element's format specification, rather than from the ALIGN attribute for the entire ENTRY.

  12. The #IMPLIED value for the VALIGN attribute (i.e., at the THEAD/TBODY/TFOOT level) is interpreted as BOTTOM for THEAD, and TOP for TBODY and TFOOT.

  13. The CHAR and CHAROFF attributes shall be meaningful for an ENTRY only when the related ALIGN attribute is set to char alignment.

  14. The CHAR attribute shall contain only a single character to which the column is aligned. If this character occurs more than once in cell content, its leftmost occurrence is used for alignment. If this character does not occur in a specific cell, then that cell content shall right-align to the left of the offset point specified by CHAROFF (50 percent if CHAROFF is not used). The value for CHAR shall not be an SDATA entity.

  15. The #IMPLIED value for the CHAR attribute (i.e., at the TGROUP level) is (no character). Defaulting to no character shall be interpreted as causing the content to right-align to the left of the offset point specified by CHAROFF (50 percent if CHAROFF is not used). This is equivalent to the behavior when an individual cell does not contain the specified CHAR value.

  16. The CHAROFF percentage is measured to the left edge of the first CHAR (the alignment character) in the entry content.

  17. The #IMPLIED value for the CHAROFF attribute (i.e., at the TGROUP level) is interpreted as 50 (alignment occurs at 50 percent of the current cell width).

  18. COLSEP and ROWSEP are interpreted as adding a single light ruling respectively to the right and bottom sides of the cell. (Note that until more descriptive attributes are added for cell rulings, a single light ruling is still somewhat ambiguous and could result in slightly different visual appearances across systems.)

  19. If not provided by a style specification, the #IMPLIED value for the COLSEP attribute (i.e., at the TABLE level) is interpreted as 1 (rulings between columns).

  20. If not provided by a style specification, the #IMPLIED value for the ROWSEP attribute (i.e., at the TABLE level) is interpreted as 1 (rulings between rows).

  21. COLSEP values for the last column, and ROWSEP values for the last row, are always overridden by the FRAME attribute on TABLE, whether the value of FRAME is explicitly provided or is defaulted.

  22. The ROTATE attribute (on ENTRY) value of 1 (yes) is interpreted as rotating the ENTRY 90 degrees counterclockwise from the current table cell orientation, with a default of 0 (no relative rotation). Thus rotation is cumulative for a LAND table.

6.3.4Cell Content

  1. Only one ENTRYTBL is permitted in a single table cell, despite the current content model which would permit multiple ENTRYTBLs.

  2. NAMEST, NAMEEND, COLNAME and SPANNAME attributes for ENTRYTBL are interpreted exactly as for an ENTRY.

  3. The COLS attribute for an ENTRYTBL is interpreted exactly as at the TGROUP level for the table as a whole. Within the ENTRYTBL, the same rules for interpreting and inferring COLSPECs and ENTRYs are used as for the table as whole.

  4. The TGROUPSTYLE attribute for ENTRYTBL is interpreted exactly as at the TGROUP level for the table as a whole (i.e., it provides a hook to apply style information to the specific ENTRYTBL—which implicitly contains one TGROUP).

  5. COLSEP and ROWSEP attributes on an ENTRYTBL element are interpreted exactly as for an ENTRY, defining respectively the left and bottom side rulings for the ENTRYTBL as a whole. If not provided explicitly, these values are inherited from higher levels, and will be overridden by the FRAME attribute for the table if the ENTRYTBL occurs in the last column and/or last row of the main table.

    The ultimate defaults for COLSEP and ROWSEP within the ENTRYTBL are interpreted as 1 (as is true at the TABLE level), independently from the COLSEP and ROWSEP values on the ENTRYTBL element itself.

  6. ENTRYTBLs always fill the entire width of the containing column or span. ALIGN, CHAR, and CHAROFF attributes on an ENTRYTBL element are not to be interpreted as influencing the alignment of the ENTRYTBL as a whole.

  7. All format attributes on an ENTRYTBL element (i.e., ALIGN, CHAR, CHAROFF, COLSEP, and ROWSEP) are assumed to also set default values for the cells within the ENTRYTBL (similar to such attributes at the TABLE and/or TGROUP level for the main table). If not set explicitly, these values are themselves defaulted through inheritance paths exactly as for an ENTRY in the main table.

  8. Within the ENTRYTBL, inheritance paths and overrides are determined using the same rules as for the table as a whole.

  9. There is no implication that multiple ENTRYTBLs in the same row will have the same number of subrows or their subrows aligned.

7Next steps

As a result of this study the Committee plans to propose an SGML Technical Resolution that will provide a common definition of tables interoperability using the CALS model. We are also sharing our recommendations with the CALS Electronic Publishing Committee (EPC) as input to improving the CALS table model and its documented semantics. When this phase of the Committee's work is complete, we will move on to the second goal in our mission statement: suggesting a standard framework and set of approaches for the next generation of SGML table markup. This work will explore where the current CALS table model falls short, going beyond format and layout issues to a model which captures the author's intent for underlying table data in an unambiguous and interchangeable way. We expect this may include a set of standard approaches and DTD fragments for different purposes.

ASurvey questions

The following questions were submitted to all SGML Open Member companies in order to find commonality and differences in the implementations of the CALS Table Model. Detailed answers identified more questions. Several rounds of clarification and augmentation of these questions occurred.

The results for the questions that bear on interoperability are grouped in Appendix B, Detailed analysis of survey results in a slightly different order and are paraphrased along with the percentage of vendors indicating lack or difference in support.

1Product questions

  1. Do you offer a product (or products) that

    1. accepts SGML-encoded tables

    2. creates SGML-encoded tables

    3. both?

  2. Please provide the name of the product(s) and a brief description including the role of SGML-encoded tables.

2Detailed questions: backbone table structure / format issues

  1. Do you support multiple TGROUPs (i.e. a redefinition of columns within a table?

  2. Do you support a separate THEAD section?

  3. Do you support a separate TFOOT section?

  4. Do you support the ORIENT attribute for TABLE (both portrait and landscape)?

  5. If so, do you interpret landscape as 90 degrees counterclockwise from text alignment for the document?

  6. What do you assume as the default value if ORIENT is not specified? (note this is #IMPLIED)

  7. Do you support the PGWIDE attribute for TABLE?

  8. If so, how do you interpret its yes and no values?

  9. What do you assume as the default value if PGWIDE is not specified? (note this is #IMPLIED)

  10. Do you support the TABSTYLE attribute for TABLE? If so, how is it used?

  11. Do you support the TGROUPSTYLE attribute for TGROUP? If so, how is it used?

  12. Do you support the FRAME attribute for TABLE?

  13. If so, do you support all allowed FRAME values (top, bottom, topbot, all, sides, none)?

  14. How do you interpret FRAME (e.g. frame surrounding the whole table or as a definition of the side rulings for the table itself)?

  15. Do you assume ALL borders as the default value if FRAME is not specified? (note this is #IMPLIED.)

3Detailed questions: row / column structure issues

  1. Do you support horizontal spanning of cells across columns (NAMEST, NAMEEND)?

  2. Do you support SPANSPECs?

  3. Do you support separate COLSPECs at the THEAD level?

  4. Do you support separate COLSPECs at the TFOOT level?

  5. Do you allow the number of columns in a THEAD or TFOOT section to be different than the containing TGROUP?

  6. How do you resolve a situation where the total width of columns defined in a THEAD or TFOOT is different than the total width of the TGROUP?

  7. Do you support vertical spanning of cells across rows (MOREROWS)?

  8. Do you use proportional width columns for columns without COLSPECs?

  9. What happens if there are not as many COLSPECs defined as there are COLS for the TGROUP?

  10. Do you ignore COLSPECs in excess of the number defined by the COLS attribute?

  11. Do you allow some COLSPECs to be numbered and some not? (note COLNUM attribute is #IMPLIED)

  12. If yes, what rules do you use to resolve a situation where some COLSPECs are numbered and some are not? (note COLNUM attribute is #IMPLIED)

  13. How do you name COLSPECs you create?

  14. Do you map COLSPECs to table column using the COLNUM attribute?

  15. If yes, can the COLSPECs be out-of-order?

  16. Do you permit some ENTRYs to refer to COLSPEC names and some not (note COLNAME attribute is #IMPLIED)?

  17. What rules do you use to resolve a situation where some ENTRYs refer to COLSPEC names and some do not (note COLNAME attribute is #IMPLIED)?

  18. Do you allow SPANSPECs to be referred to within THEAD and TBODY?

  19. If so, how do you ensure that such SPANSPECs only refer to COLSPEC definitions applicable to THEAD and TFOOT? (note COLSPECs but not SPANSPECs can be defined within THEAD and TFOOT)

  20. Do you allow overlap between COLSPEC and SPANSPEC names?

  21. If not, how do you ensure that they do not overlap?

  22. Do you ignore SPANNAME for ENTRIES that have both a NAMEST/NAMEEND and SPANNAME specified, or both a COLNAME and SPANNAME?

  23. Can ENTRYs be accepted in any order if they explicitly identify the column?

  24. When exporting a table, do you explicitly specify the COLNAME or NAMEST attribute for an entry immediately to the right of the spanned ENTRY?

  25. When a table cell is covered by a preceding ENTRY with MOREROWS specified, do you require the covered entry to be

    1. completely absent

    2. tagged but with no data content

    3. hidden by the spanning entry?

  26. How do you handle the situation where all ENTRYs in a row are covered in this manner? (note that the DTD requires at least one ENTRY in each row)

  27. Should the CALS table model be extended to include a concept of stub columns and stub hierarchies of first column(s) that would replicate for horizontal continuations of tables too wide to fit on a page?

  28. Do you believe the CALS table model should be extended to include a concept of titled subgroups (e.g. row heads)?

3.1Column Widths

  1. Do you support fixed COLWIDTHs?

  2. If so, what is the list of all possible units and their abbreviations?

  3. Size/distance paragraph. These are pi, pt, in, mm, cm, em. E.g. COLWIDTH = 2.5 inches is not allowed—it should be 2.5 in (and an appropriate error message is displayed).

  4. Do you allow both integer and decimal values?

  5. Are multiple units of measure allowed within the same table (e.g. column one measured in picas, but column two in inches)?

  6. Is there a default unit if none is specified?

  7. Do you support proportional COLWIDTHs?

  8. Do you allow both integer and floating point [ed. decimal] values?

  9. Do you support mixed (i.e. fixed plus proportional) COLWIDTHs?

  10. What are the rules used to resolve each column's actual width?

  11. Do you assume unit proportional measure as the default value if COLWIDTH is not specified? (note this is #IMPLIED)

3.2Rotation and Alignment

  1. Do you support the ROTATE attribute for ENTRYs?

  2. If so, do you interpret its meaning as 90 degrees counterclockwise from text alignment of the table?

  3. Do you support the ALIGN attribute?

  4. If so, do you support all allowed values (left, right, center, justify, char) including use of CHAR and CHAROFF attributes?

  5. Do you assume left as the default value if ALIGN is not specified? (note this is #IMPLIED at all levels)

  6. Do you support CHAR alignment?

  7. If CHAR is supported, do you have any limitation on the CHAR values that are allowable?

  8. Do you allow a sequence of more than one character as the CHAR value? (note this is defined as CDATA).

  9. Do you use the leftmost if the text to be aligned contains more than one occurrence of the specified CHAR value?

  10. Do you ignore the CHAR alignment if so much text occurs to the left of that character, even if that amount requires line-wrap?

  11. Does the CHAROFF percentage position to (l) left edge, (c) center, or (r) right of the CHAR

  12. Do you support the VALIGN attribute?

  13. If so, do you support all allowed values (top, middle, bottom)?

3.3Cell Borders

  1. Do you support COLSEP and ROWSEP?

  2. If so, do you interpret COLSEP as the right border and ROWSEP as the bottom border?

  3. Do you provide single light border rulings for yes?

  4. Do you assume 1 yes for COLSEP unspecified (note it is #IMPLIED at all levels)

  5. Do you assume 1 yes as the default values if ROWSEP is unspecified? (note it is #IMPLIED at all levels)

  6. Do you use the FRAME attribute on TABLE to determine borders for the left side of the first column and the top of the first row?

  7. If not, how do you determine these borders?

  8. Do you also allow the FRAME attribute to override ROWSEP for cells in the last row, and to override COLSEP for cells in the last column?

3.4Inheritance

  1. Is the sequence for determining the ALIGN / CHAR / CHAROFF values ENTRY < ROW < COLSPEC < TGROUP?

  2. If not, what are the inheritance rules for ALIGN / CHAR / CHAROFF?

  3. Is the sequence for determining the VALIGN value ENTRY < ROW < (THEAD / TBODY / TFOOT)?

  4. If not, what are the inheritance rules for VALIGN?

  5. Is the sequence for determining the ROWSEP values ENTRY < ROW < SPANSPEC < COLSPEC < TGROUP < TABLE?

  6. If not, what are the inheritance rules for ROWSEP?

  7. Is the sequence for determining the COLSEP values ENTRY < SPANSPEC < COLSPEC < TGROUP < TABLE?

  8. If not, what are the inheritance rules for COLSEP?

  9. Does SPANSPEC formatting (alignment and borders) override ENTRY formatting attributes?

  10. Does formatting from a COLSPEC defined in THEAD or TFOOT override a COLSPEC for the same column defined at the TGROUP level?

  11. If COLSPEC from THEAD or TFOOT overrides that from TGROUP, does this apply to the cell alignment and border parameters?

  12. If COLSPEC from THEAD or TFOOT overrides that from TGROUP, does this apply to COLWIDTH?

4Detailed questions: cell content issues

  1. Do you support graphics within table ENTRYs?

  2. If so, are there any limitations on this other than those imposed by the DTD?

  3. How do you handle input documents that do not match these limitations?

  4. What happens if a graphic doesn't fit within the width of the table column (i.e. do you dynamically resize the image, clip it to fit, dynamically adjust the table geometry, etc.)?

  5. How do you handle a situation where a page break occurs in the middle of a graphic?

  6. Do you support other structural objects within table ENTRYs (e.g. warnings, cautions, notes, lists, content tags, emphasis, math, etc.)?

  7. If so, are there any limitations on this other than those imposed by the DTD?

  8. How do you handle input documents that do not match these limitations?

  9. Do you support ENTRYTBL (tables within tables)?

  10. If so, are there any limitations on this other than those imposed by the DTD?

  11. How do you handle input documents that do not match these limitations?

  12. Do you support multiple sets of column definitions within an ENTRYTBL (i.e. multiple TGROUP-like sections)? (note that the DTD allows one or more such sections)

  13. How do you interpret COLSEP / ROWSEP attributes at the ENTRYTBL level (i.e. as the default value for all ENTRYs within the ENTRYTBL

    1. like such attributes at the TABLE or TGROUP level

    2. or as setting the overall right and bottom borders

    3. thus overriding COLSEPs for the last column and ROWSEPs for the last row, like the FRAME attribute at a TABLE level)?

  14. How do you interpret the ALIGN / CHAR / CHAROFF attributes at the ENTRYTBL level (i.e. as the default value for all ENTRYs within the ENTRYTBL

    1. like such attributes at the TABLE or TGROUP level

    2. or as somehow applying to the position of the ENTRYTBL within the column)?

  15. What happens if an ENTRYTBL doesn't fit within the width of the table column (i.e. do you dynamically resize its column widths, clip it to fit, dynamically adjust the table geometry, etc.)?

  16. How do handle a situation where a page break occurs in the middle of an ENTRYTBL?

  17. How do you handle a situation where there are multiple ENTRYTBLs in the same row whose internal row boundaries in the different ENTRYTBLs don't match?

  18. Are inheritance rules for cell formatting attributes within an ENTRYTBL the same as those for the table as a whole?

  19. Do you believe the CALS table model should be extended to explicitly differentiate table footnotes from other footnotes?

  20. Can you export as an entity a table that came from a file entity back to the original entity?

5General questions

  1. Do you support import/export of the CALS 28001B table model?

  2. Please identify in detail any differences between the features supported on input versus output.

  3. On input, do you support

    1. any valid CALS 28001B table that parses against the DTD

    2. a more restricted subset?

  4. If the latter, how do you characterize this subset for your users (i.e. what are the rules for acceptable input)?

  5. What size limits exist in table handling?

  6. Do you support table features that go beyond the current limitations of the CALS table model?

  7. Based on your experience, do you believe there is a need to extend or modify the CALS table model? What specific changes would you like to see?

  8. Please list any interoperability issues of which you are aware between your CALS table support and that of any other product on the market.

  9. Do you support the CALS 28001B table model for non-CALS applications that have adopted it in their DTDs?

  10. If so, please describe the applications and indicate which of these have actually been implemented by your customers.

  11. What other SGML table models do you support?

BDetailed analysis of survey results

The survey questions and responses from seven vendors from Appendix A, Survey questions went through many iterations. The final results from the vendors in February 1995 are combined in two sets of tables.

  1. Unsupported features, indicating as issues those places where there appear to be any differences in support issues (showing percent of vendors not supporting) and ISSUE if more than one third provide no or incomplete support.1

  2. Differences in interpretation, indicating all places where there appear to be issues (showing percent of vendors with differing interpretations, and considering an ISSUE if there is any difference. Such differences in interpretation result from inadequate semantic description.2

From these two sets of tables, the significant issues were identified. The questions that lead to quantitative comparisons are paraphrased in the first column of the tables. the percentage scores in the second (larger scores are worse), and the (issue/non-issue) status in the third.

1Unsupported features

The original survey distinguished among:

  • lack of support,

  • important support limitations, and

  • full support.

The Scores below show the percent of vendors that do not provide full support. ISSUE status is assumed if more than 1/3 of vendors fail to fully support (score > 33%)

Table B.1Backbone table structure / format
Backbone table structure / formatScoreStatus
Multiple TGROUPs (redefinition of columns) 29%
Separate THEAD section0%
Separate TFOOT section43%ISSUE
ORIENT attribute (portrait vs. landscape) 57%ISSUE
PGWIDE attribute (full page vs. column) 71%ISSUE
TABSTYLE attribute (named styles)86% ISSUE
TGROUPSTYLE attribute (named group styles) 86%ISSUE
FRAME attribute (outer borders)14%
Table B.2Row / column structure
Row / column structure ScoreStatus
Horizontal spans using SPANSPEC0%
Horizontal spans on cells using NAMEST/END 43%ISSUE
Vertical spans using MOREROWS14%
Preservation of SPANSPEC import to export 57%ISSUE
Preservation of COLNAMEs import to export 57%ISSUE
SPANSPECs allowed in THEAD and TFOOT 0%
Separate COLSPECs at the THEAD level 57%ISSUE
Separate COLSPECs at the TFOOT level 71%ISSUE
Different number of columns in THEAD or TFOOT86%ISSUE
Table B.3Cell formatting - column widths
Cell formatting - column widthsScoreStatus
Fixed COLWIDTHs14%
Decimal values for fixed COLWIDTHs29%
Proportional COLWIDTHs0%
Decimal values for proportional COLWIDTHs 57%ISSUE
Mixed COLWIDTHs (proportional/fixed in one column)57%ISSUE
Table B.4Cell formatting - rotation and alignment
Cell formatting - rotation and alignmentScoreStatus
ROTATE attribute (rotation of individual cells)100%ISSUE
ALIGN attribute - left/right/center values14%
ALIGN attribute - justify value29%
ALIGN attribute - char value (with CHAR attr)29%
CHAROFF attribute57%ISSUE
VALIGN attribute - top/middle/bottom values29%
Table B.5Cell formatting - cell borders
Cell formatting - cell bordersScore Status
COLSEP / ROWSEP attributes14%
Table B.6Cell formatting - inheritance
Cell formatting - inheritance ScoreStatus
Preservation of inheritance import to export57%ISSUE
Inheritance of cell alignment from SPANSPEC 86%ISSUE
Inheritance of cell borders from SPANSPEC 86%ISSUE
Table B.7Cell content
Cell contentScore Status
Graphics within table ENTRYs0%
Mixture of text and graphics in table cells29%
Other structural objects within table cells14%
ENTRYTBL (tables within tables)57% ISSUE
Multiple TGROUPs within ENTRYTBL71% ISSUE

1.1Differences in interpretation

The original survey results distinguish among implementations where semantic descriptions were inadequate:

  • different interpretation,

  • potential difference, and

  • consistent interpretation.

ISSUE status is assumed if ANY vendor interprets things differently (score > 0%)

Table B.8Backbone table structure / format
Backbone table structure / formatScoreStatus
ORIENT landscape as 90 degrees counter0%
Default for ORIENT as no relative rotation0%
PGWIDE yes as full page, no as column0%
Default for PGWIDE as full page 0%
TABSTYLE as style name for entire table 0%
TGROUPSTYLE as style names for TGROUPs 0%
FRAME as override for outer borders 14%ISSUE
Default for FRAME as all borders on14%ISSUE
Table B.9Row / column structure
Row / column structure ScoreStatus
Number of cols determined absolutely by COLS43%ISSUE
Create 1* cols if less COLSPECs than COLS43%ISSUE
Ignore excess if more COLSPECs than COLS57%ISSUE
COLSPECs allowed to be non-sequential 57%ISSUE
Fit unnumbered COLSPECs next in sequence 57%ISSUE
ENTRYs allowed to be non-sequential 29%ISSUE
Fit unnumbered ENTRYs next in sequence 43%ISSUE
SPANSPECs in head/foot refer to local COLSPECs43%ISSUE
COLSPEC and SPANSPEC names allowed to overlap29%ISSUE
Precedence COLNAME->NAMEST/END->SPANSPEC 100%ISSUE
Error if covered ENTRY present with content14%ISSUE
Table B.10Cell formatting - column widths
Cell formatting - column widthsScoreStatus
Allowed fixed units exactly: IN, CM, MM, PI, PT100%ISSUE
Default fixed unit as PT 86%ISSUE
Default for COLWIDTH as unit proportional (1*)14%ISSUE
Table B.11Cell formatting - rotation and alignment
Cell formatting - rotation and alignmentScoreStatus
Default for ALIGN as left 29%ISSUE
CHAR can be any ASCII character, but no SDATA0%
CHAR can be a single character only 0%
Align on leftmost occurrence of CHAR 0%
Align to left side of char bounding box14%ISSUE
Table B.12Cell formatting - cell borders
Cell formatting - cell bordersScoreStatus
COLSEP as right border / ROWSEP as bottom 0%
COLSEP / ROWSEP yes as single light ruling0%
Default for COLSEP / ROWSEP as yes 14%ISSUE
FRAME attribute determines left/top border0%
FRAME attribute overrides right/bottom border14%ISSUE
Table B.13Cell formatting - inheritance
Cell formatting - inheritance ScoreStatus
ALIGN precedence as entry<span<col<tgrp 57%ISSUE
VALIGN precedence as entry<row<thead/body/foot 43%ISSUE
ROWSEP precedence as entry<row<span<col<tgrp<tbl 86%ISSUE
COLSEP precedence as entry<span<col<tgrp<tbl 71%ISSUE
Local COLSPECs override for thead / tfoot71%ISSUE
Table B.14Cell content
Cell contentScore Status
Graphics that don't fit resized not clipped14%ISSUE
ENTRYTBL format attrs apply only within 0%
ENTRYTBLs that don't fit resized not clipped14%ISSUE
Inheritance rules for ENTRYTBL same as table0%