FOSI vs. XSL-FO

Useful stuff for users and developers of the XML based dynamic publishing system "Arbortext". All information in this wiki came from posts to the Arbortext Adepters mailing list that were identified by members of that list as being extremely useful because the mailing list has moved around over the years. For more information see the group FAQ.

This forum and its sub-forums are open to the public.
Forum rules
Discussion posts to this group should be relevant to Arbortext users. Posts on other products or from vendors are welcome, so long as the post also mentions or includes Arbortext in the subject matter. General posts on dynamic publishing and posts from vendors other than PTC will be subject to higher scrutiny and should generally avoid posting here (especially if the referenced material does not mention Arbortext). Advertisement posts for other products (e.g., "check out my product as an alternative") are not welcome. This forum is for Arbortext users to communicate with other Arbortext users about working with Arbortext products.

All posts to this forum are moderated.

User avatar
liz
Posts: 353
Joined: Sun May 31, 2015 2:34 am

FOSI vs. XSL-FO

Postby liz » Mon Jun 15, 2015 5:02 pm

Eliot Kimber suggests solutions to alleged shortcomings of XSL-FO
Last Updated: 2006-08-23

Question by N.N


I've been unable to find convenient ways to do the following in FO:
  • Determine page numbers for links in multi-volume books - short of putting all the other volumes at the end of the postscript file and then using a perl script to eliminate them before printing.
  • Smartly insert line breaks in table cells when a cell boundary overflow is about to occur. (Would also like to be able to provide a QA script to alert writers to this issue before they print).

Elliot Kimber answers


Re: Point #1


There's a number of ways this could be done, depending on the FO engine you're using or the details of your workflow, but they all require doing some sort of two-pass process and in no case would the FO standard itself provide what you need, for simple reason that it's too implementation specific.

In essence, what you need is a way to save off information about the pagination of a given document or document part so that you can then read that information on a second pass and use in generating your final output.

I've suggested to all the FO implementation vendors that I work with that they provide a way to emit this information into "side files". So far no dice, although both XSL Formatter and XEP let you save off an intermediate file that reflects the pagination. This could be used but these files tend to be huge (which is why I want side files).

Another approach is to examine the PDFs you've generated and get pagination information from there. Not too hard but not always the best thing. Ken Holman uses a trick where he generates pagination information into the first page of his PDF in a form that is then easy to extract using a simple Java PDF manipulation library. He runs two passes--the first creates the PDF with this page, the second extracts the information and generates the final PDF. He uses this for back-of-the-book index generation but the technique could be used for anything.

For one client we implemented a two-pass mechanism for generating lists of effective pages where, as part of the XSLT to generate the final FO, we used XSL Formatter's Java API to render each page set in order get the number of pages in that set, from which we could then determine the pages to go into the LOEP. This creates what is essentially a two-pass process but it is implemented as a single processing step in the tool chain. Because the intermediate FOs are never written out we minimized processing time by avoiding any file I/O for the first pass.

I don't know if Epic's FO processor provides a similar sort of API.

Re: Point #2


This is a function of the FO implementation's line breaking algorithm. For example, XEP never breaks lines in this case, instead trying to squeeze the text, while XSL Formatter will break lines wherever it hits the cell boundary. You can also do tricks with putting zero-width spaces at places where a break is allowed, which lets the renderer break an otherwise unbroken sequence of characters. This should work with all FO implementations.

User avatar
liz
Posts: 353
Joined: Sun May 31, 2015 2:34 am

Re: FOSI vs. XSL-FO

Postby liz » Mon Jun 15, 2015 5:03 pm

Editors note by Liz Fraley: This is Part 2 of the discussion


Written by: Eliot Kimber
Last Updated: 2006-08-25


XSL-FO 1.1 is fast approaching Candidate Recommendation status. You can view the latest working draft here: http://www.w3.org/TR/xsl11/

Editor's comment by Karl Johan Kleist: this did happen 20 Feb. 2006


1.1 adds a number of important new features, including table markers, which can be used to create "table continued" headers and footers (although I believe that, at least in the short term, Epic will continue to require the use of the FOSI float extension in order to do this).

1.1 also provides pretty complete back-of-the-book support, change bars, and multiple flows within a single page sequence, as well as a number of smaller enhancements and refinements.

In the abstract I don't think there can be any question that XSL-FO is the better choice over FOSI for the simple reason that FOSI is essentially a dead technology while FO has wider acceptance and support and, with the extensions provided by various vendors, satisfies the same composition requirements that FOSI does (that is, primarily technical documents with relatively simple layouts.

If you have requirements to compose documents in non-Western languages, especially CJK, Thai, and Hindi (Devanagari script), then FOSI is not really an option because Epic's support for these languages is either incomplete for many applications or nonexistent (Thai). This is true for pretty much any other composition technology you might try: FrameMaker, XPP, etc. If you need Thai, FO (and XSL Formatter) is pretty much your only option today.

In the context of Epic products specifically, FOSI is still attractive because it's what Epic Composer is optimized for and Epic's FO support is still not complete (in terms of useful FO features). If you have resources who can create and maintain your FOSIs then FOSI may still be the best choice, at least until Epic's FO support provides the same level of layout features and performance that its FOSI support does.

While a practical FO implementation still usually requires the use of proprietary extensions to satisfy layout requirements FO alone can't meet, this isn't that bad:
  • The W3C way is for standards to trail requirements, letting the market drive new features and then standardizing those that get significant implementation. If you look at FO 1.1 you'll see that it almost entirely reflects features that were first implemented as extensions in at least one FO 1.0 implementation. For example, both XEP and XSL Formatter had extensions to support back of the book indexes. In 1.1 we defined new index support that reflects the requirements those extensions supported and reflects what was learned in practice from both of those extensions. That will almost certainly continue to be the case going forward.
  • Because the FO instance is something you normally transform into, there's no *data* commitment to non-standard extensions, just code. In practice, the amount of code in a typical FO generation process that reflects extensions is maybe 10% of the total code (at least that's my experience in writing transforms that support both XEP and XSL Formatter as targets). Using the code modularity features of XSLT it's easy to apply standard engineering practice to further isolate the processor-specific code, minimizing the cost of using extensions.
  • There are some things FO will never standardize for the simple reason that they are outside the scope of the FO spec, such as generation of PDF-specific constructs, print job control, and so on.
  • At the end of the day, the job is getting the pages out in a way that minizes cost while providing appropriate quality. This is a job that always requires some amount of hacking and fudging. Always has and always will. The best we can do is minimize the hacks and engineer our systems to protect ourselves from our own hacks. In my analysis, for applications for which FO is otherwise suitable, FO+XSLT provides much better hack protection than any other available options.


Return to “Arbortext Code Archive (adepters.org) (Public)”

Who is online

Users browsing this forum: Baidu [Spider] and 12 guests