For those for whom web standards matter, the devil is always in the detail. The content-management system Drupal throws this at us by the bucket load. No doubt other CMSs do the same. Fortunately, Drupal (both versions 7 and 8) comes up with its own solution to the problem.
A common scenario is to create a list of links to posts - and to include in the list a snippet (or teaser) of each post's text. This list may be in the form of a grid or just a vertical stack of posts. Each item will typically consist of a title, a publication date and then the all-important snippet of text. Sounds simple enough. There's an example on the home page of this website.
What if the snippet of text contains some HTML tags? Maybe the author has wrapped a word or two in <bold>bold</bold> or <em>italic</em> tags. And why not? Easy enough to do. Open the tag, write your text, close the tag. Always close the tag; never leave it unclosed. Basic stuff.
Here's an example, viewed from the editing side of things:
The grid that displays some of the opening text of that post is set to grab only the first 250 characters of the post's body text. With the post's title and published date (and a graphic for good measure), here's what the final result should be:
By coincidence, the 250th character of that particular post lands within mark-up tags for italic text. The red vertical line in the next screenshot shows where that point falls. (Drupal cuts the text off near that 250th character mark, but never mid-word.)
Our snippet of text in the Drupal Views grid is now going to terminate mid-code, leaving a field with a run of mark-up that is missing its closing tag. Despite the author's care to use even the simplest HTML correctly, the CMS she's publishing it in is going to mangle things and throw sand in the gears. Unless one does something about it, it's possible that either the code for that part of the page breaks or the CMS notices something is wrong, attempts to fix it, but renders an even more broken version.
It's perfectly possible for your CMS's text-rendering engine to get thoroughly confused and start spitting out erroneous tags, as below, where you can see that a truncated snippet of text ending in italic, marked-up text, has had its closing tag cut off, resulting in <em>italic</em> tags being sprayed throughout the following mark-up. Whoops. HTML validation fails. SEO digestibility possibly falters. Who knows?
So there's an everyday problem for us folk working with CMSs to manage our publishing of content on the internet. That's one of the gotchas that Drupal Views can hit us with. We've been teaching our authors how to write even the simplest HTML but have apparently left them at the mercy of clunky publishing systems.
Fortunately that wonder of a system Drupal Views, as well as bungling this post teaser-building process, has a built-in fix for it. The hitch here is that it's not switched on by default. That's been left for you to do.
Open up the view in question, click on the Body field that is part of that view, open up its Rewrite results panel and there is the option we want. It's the Field can contain HTML checkbox. It happens to be right under the options that we've already played with to get the Body field to be trimmed to a designated maximum length. So check that checkbox and the problem will be solved.
Note the small print right under that option: "if checked, HTML corrector will be run to ensure tags are properly closed after trimming". Ah! We already have Drupal's in-built HTML corrector configured for both pages (where this particular view will be displayed) and for the content type being displayed in the view itself, don't we? Right. But this option forces the corrector to go to work on the content when it is extracted for use by a view. Thus we get cleaned-up mark-up with properly closed tags irrespective of what sort of content-trimming is taking place. Another bow to nerdiness, please! HTML validation is ensured once more.
You might argue that I shouldn't expose authors to needing to manually apply HTML tags. Why not use a WYSIWYG text-editor? Yes, I could (if there was one that never - ever - mangled content by introducing extraneous mark-up). But even if there was a faultless WYSIWYG editor, authors would still be introducing opening and closing tags. And Drupal Views would still be truncating fields within these tags.
You might also argue that Drupal Views can be configured to strip HTML tags from text fields, whether truncated or not. True, but maybe we really want to display bold or italic text in the view summary.
You might even argue that authors should be prevented from adding - or applying - tags in the first, say, 275 characters of a node's body. Yes, that's possible. But what if the view displays summaries of body fields from six or eight different content types? One can't retro-format that many content types with their existing published nodes.
So why didn't you factor in this problem before building this view of summaries?! Well, one good answer would be that the view of summaries was only thought of and put in place long after all the content had been published. (That's web design. It's dull, dirty and imperfect.) But the main reason it never bothered me was that I knew that Drupal Views could provide that particular fix and I applied it in case it was ever needed.
Drupal Views footnote
For the jaded amongst you - which by default must be those of you who don't work with Drupal - here's a close-up of the Drupal Views interface. It is a thing of wonder to behold and is to a Drupaler what a cleaver must be to a butcher, a stethoscope to a doctor or a pair of binoculars to an ornithologist. Field can contain HTML!