Post by fantasai Post by Richard Ishida
i was wondering about how to treat enclosed alphanumerics when
text-align is set to capitalize.
See the test results at
wrt uppercase or lowercase transforms, the spec simply says "Puts all
letters in lowercase", or vice versa, and that seems to
me appropriate, for those characters that have Unicode mappings. The
text-transform-upperlower-027.html indicate that this is what happens
across all major desktop browsers.
For text-transform: capitalize, however, the spec says "Puts the first
*typographic letter unit* of each word in titlecase"
(my emphasis). As you can see in test
text-transform-capitalize-031.html, it makes sense when punctuation
and the like
precede the actual word of the text to look for the first real letter.
(All browsers pass that test.)
it's not clear to me, however, whether a word that only consists of
enclosed alphanumerics (which don't fit the definition of
'typgraphic letter unit'), or even one that starts with an enclosed
see the results of text-transform-capitalize-026.html. Firefox
currently does not. Chrome and Safari, on the other hand do
titlecase per the Unicode data. IE titlecases everything except the
first word on the page.
i can't imagine that people will want to do this very often, so this
seems much like an edge case, but i thought i'd ask the
question, all the same.
what's the answer?
I think we should go with whatever the Unicode case mapping files
define, and adjust the CSS spec wording to match.
Sorry to keep beating on this issue, but I'm not sure that really
answers the question here. This isn't primarily about what's in the case
mapping files -- which deal only with individual Unicode characters --
but about identifying the "typographic letter units" to which the case
mapping should be applied.
The current CSS 'capitalize' transform is quite different in that regard
from the toTitlecase(s) function defined by Unicode, which as far as
I can recall is the nearest parallel:
# R3 toTitlecase(X): Find the word boundaries in X according to
# Unicode Standard Annex #29, “Unicode Text Segmentation.” For each
# word boundary, find the first cased character F following the word
# boundary. If F exists, map F to Titlecase_Mapping(F); then map all
# characters C between F and the following word boundary to
In general terms, the key issue here is that toTitlecase applies case
mappings to all the letters of a word (Titlecase to the first, and
Lowercase to the rest), whereas the CSS property applies Titlecase to
the first letter and leaves the rest unchanged. Therefore, given content
Ramsay MacDonald visits the USA
text-transform:capitalize will result in
Ramsay MacDonald Visits The USA
whereas Unicode's toTitlecase() would give
Ramsay Macdonald Visits The Usa
which I don't think is desirable.
Given this difference in approach, I think we should continue to let CSS
Text define exactly what text-transform:capitalize does -- in
particular, which characters it affects -- rather than delegating this
As Richard points out, the current draft of CSS Text excludes the
enclosed alphanumerics ⓐⓑⓒ etc. from its definition of "typographic
letter units", and therefore they should also be excluded from the
"words" that 'capitalize' affects. IMO, that's the most reasonable
option: these characters are more symbol- or dingbat-like than
letter-like, as reflected in their Unicode General Category of "So". So
I'd like the WG to confirm that this is the correct interpretation of
A further issue that I don't think has been mentioned here relates to
the 'uppercase' and 'lowercase' transforms. ISTM that these transforms,
too, should only affect "letters" (or "typographic letter units", as CSS
Text likes to call them) and should leave Symbol characters untouched,
even though some Symbol characters -- by no means all the "enclosed
letter-based" ones -- do have case mappings. The CSS Text draft is less
clear about this, inasmuch as it fails to link the term "letters" in
'uppercase' and 'lowercase' to a definition in the Terminology section
(as earlier drafts did), but the only plausible interpretation I can see
is that "letter" here is shorthand for "typographic letter unit", and so
once again the Symbol characters are excluded.
AFAIK, all engines -- including Gecko, which gets 'capitalize' right by
this interpretation -- currently mishandle this, and apply case mappings
to Symbol characters. However, I doubt that changing our behavior to
match the spec here is likely to "break the Web" in any substantial way,
and it would put us in a more consistent and predictable state. (It
would seem odd that 'text-transform:uppercase' affects ⓐⓑⓒ if
'text-transform:capitalize' does not; or that 'text-transform:lowercase'
affects ⒶⒷⒸ but not 🅐🅑🅒.)
In summary, I think the CSS Text spec should maintain its definition of
these transforms as applying only to letters, and should reinstate its
link to the definition of "[typographic] letter [unit]" for 'uppercase'
and 'lowercase' to reinforce this. An informative note could be added
alerting implementers to the fact that some non-Letter characters have
case mappings defined in Unicode, but should *not* be affected by these
 http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf, page 154.