Discussion:
[css-text] text-transform:capitalize and Unicode digraphs
(too old to reply)
Jonathan Kew
2015-03-15 18:50:56 UTC
Permalink
Raw Message
Unicode includes a few digraph characters such as "dz" and "lj" that have
uppercase (DZ, LJ) and titlecase (Dz, Lj) equivalents. How should these be
handled by text-transform:capitalize when they occur in word-initial
position?

It's clear that the lowercase digraphs (dz) will be transformed according
to their titlecase mapping (Dz), and that titlecase digraphs will be
unchanged. But what should be done when the text contains an uppercase
digraph such as DZ?

By a strict reading of the current CSS Text draft[1]:

# 'capitalize'
# Puts the first typographic letter unit of each word in titlecase;
other characters are unaffected.

together with the Unicode standard, which gives Dz as the titlecase
mapping for DZ, it appears that a word-initial uppercase digraph should
be converted to its titlecase (mixed) form. This is the behavior I see
in WebKit and Blink with an example like:

data:text/html;charset=utf-8,<div
style="text-transform:capitalize">DZa Dza dza

which renders all three "words" identically: "Dza Dza Dza". Gecko, in
contrast, does NOT apply the titlecase mapping if the first letter is
already uppercase, and so the example renders as "DZa Dza Dza".

Although the spec/WebKit/Blink behavior looks "better" for this
(artificial) example, I would argue that Gecko's behavior is preferable.
While the "DZa" result here does look poor, it makes little sense for an
author to enter text in this form in the first place. In contrast,
consider what happens if text that is originally entered as
all-uppercase is subject to text-transform:capitalize:

data:text/html;charset=utf-8,<div
style="text-transform:capitalize">LJUBLJANA

Here, WebKit and Blink will render the word as "LjUBLJANA", while Gecko
gives the (better) result "LJUBLJANA".

IMO, this example -- where the entire word is uppercase -- seems more
important than the case where an uppercase digraph has been used to
begin an otherwise-lowercase word.

So I'd like to propose a minor change to the definition, something like:

# 'capitalize'
# Puts the first typographic letter unit of each word in titlecase,
unless it is already uppercase, in which case it is unchanged. Other
characters are unaffected.

An alternative, perhaps even better, would be to make it contextual:

# Puts the first typographic letter unit of each word in titlecase,
unless it is already uppercase and is followed by another uppercase
letter, in which case it is unchanged. Other characters are unaffected.

However, given that text-transform:capitalize is likely to remain a
rather crude instrument -- it doesn't "know" about language-specific
stop lists of small words that should not be capitalized, for example --
I don't think the additional implementation cost of making it
context-dependent is worthwhile.

Feedback/comments welcomed....

JK


[1] http://dev.w3.org/csswg/css-text-3/#propdef-text-transform
Jonathan Kew
2015-03-15 22:18:51 UTC
Permalink
Raw Message
Post by Jonathan Kew
Although the spec/WebKit/Blink behavior looks "better" for this
(artificial) example, I would argue that Gecko's behavior is
preferable. While the "DZa" result here does look poor, it makes little
sense for an author to enter text in this form in the first place. In
contrast, consider what happens if text that is originally entered as
data:text/html;charset=utf-8,<div
style="text-transform:capitalize">LJUBLJANA
Here, WebKit and Blink will render the word as "LjUBLJANA", while Gecko
gives the (better) result "LJUBLJANA".
This example seems contrived. In the improved case (LJUBLJANA), you can
get the same result by not using the property at all.
Yes, clearly it wouldn't make much sense to write such an example directly.

But I think it's reasonable to suppose that sites might be applying
text-transform:capitalize to elements such as headlines that are being
pulled from external data sources, and that some of that external data
-- not under the control of the designer writing the CSS for the
aggregating site -- might at times be provided in all-caps.

JK
Brad Kemper
2015-03-16 16:13:34 UTC
Permalink
Raw Message
But I think it's reasonable to suppose that sites might be applying text-transform:capitalize to elements such as headlines that are being pulled from external data sources, and that some of that external data -- not under the control of the designer writing the CSS for the aggregating site -- might at times be provided in all-caps.
That seems unlikely;
It happens all the time.
if the vast majority of headlines one is aggregating uses conventional title case,
Depends on the source of your headlines. If the source is the first several words of a comment someone left, for instance, it might be in all caps, all lowercase, or anything in between. In such cases, text-transform:capitalize is a good way to stylistically normalize the case into something that looks like a title.
then text-transform: capitalize is going to make most imported content look worse by capitalizing things like articles, conjunctions, prepositions, and proper nouns like "amiibo", "document.URL", or "iPhone" which, conventionally, begin with a lowercase letter.
If your source is so pristine that everyone writing the headlines is consistently following the style guide for not capitalizing articles, conjunctions, prepositions, etc. then you don't need text-transform:capitalize. But others do, even if it is more simplistic algorithm.
If an aggregator is willing to Take mangle their imported text like that, then I don't see why they'd be particularly concerned about an all-caps headline being restyled with a title case digraph.
Because we don't live in a perfect world, and "better" or "good enough" is often better "didn't even try to improve" something that starts off with a lot of inconsistencies.
The above use-case seems especially unlikely because it requires three unlikely scenarios to occur at once: (A) an author applies text-transform: capitalize to all of their imported headlines;
Not at all unlikely, in many situations.
(B) the author is importing content with malformed, all-caps headlines;
Not at all unlikely.
and (C) some of those all-caps headlines contain digraphs.
fantasai
2018-03-05 06:53:58 UTC
Permalink
Raw Message
Unicode includes a few digraph characters such as "dz" and "lj" that have uppercase (DZ, LJ) and titlecase (Dz, Lj) equivalents. How
should these be handled by text-transform:capitalize when they occur in word-initial position?
It's clear that the lowercase digraphs (dz) will be transformed according to their titlecase mapping (Dz), and that titlecase
digraphs will be unchanged. But what should be done when the text contains an uppercase digraph such as DZ?
...
# 'capitalize'
#     Puts the first typographic letter unit of each word in titlecase, unless it is already uppercase, in which case it is
unchanged. Other characters are unaffected.
#     Puts the first typographic letter unit of each word in titlecase, unless it is already uppercase and is followed by
another uppercase letter, in which case it is unchanged. Other characters are unaffected.
However, given that text-transform:capitalize is likely to remain a rather crude instrument -- it doesn't "know" about
language-specific stop lists of small words that should not be capitalized, for example -- I don't think the additional
implementation cost of making it context-dependent is worthwhile.
Hi Jonathan,
The CSSWG accepted your proposed changes in
https://lists.w3.org/Archives/Public/www-style/2016Oct/0068.html
and the changes were committed in
https://hg.csswg.org/drafts/rev/11e8aa074031

Please let me know if that resolves the issue, or if further edits are needed.
Thanks (and sorry for the belated response)!

~fantasai

Loading...