Discussion:
[css-text] Enclosed alphanumerics and text-align:capitalize
(too old to reply)
Richard Ishida
2015-03-10 15:29:32 UTC
Permalink
Raw Message
i was wondering about how to treat enclosed alphanumerics when
text-align is set to capitalize.

See the test results at
http://www.w3.org/International/tests/repo/results/text-transform

wrt uppercase or lowercase transforms, the spec simply says "Puts all
letters in lowercase", or vice versa, and that seems to me appropriate,
for those characters that have Unicode mappings. The tests
text-transform-upperlower-026.html, text-transform-upperlower-027.html
indicate that this is what happens across all major desktop browsers.

For text-transform: capitalize, however, the spec says "Puts the first
*typographic letter unit* of each word in titlecase" (my emphasis). As
you can see in test text-transform-capitalize-031.html, it makes sense
when punctuation and the like precede the actual word of the text to
look for the first real letter. (All browsers pass that test.)

it's not clear to me, however, whether a word that only consists of
enclosed alphanumerics (which don't fit the definition of 'typgraphic
letter unit'), or even one that starts with an enclosed alphanumeric
block character, should be not title cased: see the results of
text-transform-capitalize-026.html. Firefox currently does not. Chrome
and Safari, on the other hand do titlecase per the Unicode data. IE
titlecases everything except the first word on the page.

i can't imagine that people will want to do this very often, so this
seems much like an edge case, but i thought i'd ask the question, all
the same.

what's the answer?

ri
Martin J. Dürst
2015-03-11 06:32:08 UTC
Permalink
Raw Message
Hello Richard,

I think this is a difficult call. On the one hand, it makes sense to
change the first letter in a string of enclosed letters. On the other
hand, it doesn't make sense to change it if it's part of some
punctuation (e.g. from as part of a list or some such).

So the alternatives are either to make this context-dependent or to just
choose a solution and stick with it. My guess is that this is an edge
case, so just picking the currently more popular solution and sticking
with it may be best.

Regards, Martin.
Post by Richard Ishida
i was wondering about how to treat enclosed alphanumerics when
text-align is set to capitalize.
See the test results at
http://www.w3.org/International/tests/repo/results/text-transform
wrt uppercase or lowercase transforms, the spec simply says "Puts all
letters in lowercase", or vice versa, and that seems to me appropriate,
for those characters that have Unicode mappings. The tests
text-transform-upperlower-026.html, text-transform-upperlower-027.html
indicate that this is what happens across all major desktop browsers.
For text-transform: capitalize, however, the spec says "Puts the first
*typographic letter unit* of each word in titlecase" (my emphasis). As
you can see in test text-transform-capitalize-031.html, it makes sense
when punctuation and the like precede the actual word of the text to
look for the first real letter. (All browsers pass that test.)
it's not clear to me, however, whether a word that only consists of
enclosed alphanumerics (which don't fit the definition of 'typgraphic
letter unit'), or even one that starts with an enclosed alphanumeric
block character, should be not title cased: see the results of
text-transform-capitalize-026.html. Firefox currently does not. Chrome
and Safari, on the other hand do titlecase per the Unicode data. IE
titlecases everything except the first word on the page.
i can't imagine that people will want to do this very often, so this
seems much like an edge case, but i thought i'd ask the question, all
the same.
what's the answer?
ri
John Hudson
2015-03-13 21:34:41 UTC
Permalink
Raw Message
It seems to me very unfortunate that the enclosed alphanumeric
characters have case mappings at all, because their typographic use is
primarily as symbols, not as letters. As such, I would expect their
'casing' to be fixed, and not to be affected by capitalisation. So, for
instance, I would expect capitalisation of this string

ⓐalpha
to be
ⓐAlpha
not
Ⓐalpha


JH
--
Tiro Typeworks www.tiro.com
Gulf Islands, BC ***@tiro.com

If stung by another man's bee, one must calculate the
extent of the injury, but also, if one swatted it in the
process, subtract the replacement value of the bee.
— Mediaeval Irish legalism
Jonathan Kew
2015-03-14 19:04:21 UTC
Permalink
Raw Message
Post by John Hudson
It seems to me very unfortunate that the enclosed alphanumeric
characters have case mappings at all, because their typographic use is
primarily as symbols, not as letters. As such, I would expect their
'casing' to be fixed, and not to be affected by capitalisation. So, for
instance, I would expect capitalisation of this string
ⓐalpha
to be
ⓐAlpha
not
Ⓐalpha
I agree this is the most reasonable behavior. Browsers differ on this;
Gecko will render your example as ⓐAlpha, whereas WebKit and Blink
produce "Ⓐalpha".

According to the current CSS Text spec, Gecko's behavior here is
correct, AFAICT. However, the test at [1] disagrees.

JK


[1]
http://www.w3.org/International/tests/repo/run?base=css-text-3&batch=text-transform&test=text-transform/text-transform-capitalize-026.html
Richard Ishida
2015-03-14 21:49:42 UTC
Permalink
Raw Message
Post by Jonathan Kew
According to the current CSS Text spec, Gecko's behavior here is
correct, AFAICT. However, the test at [1] disagrees.
JK
[1]
http://www.w3.org/International/tests/repo/run?base=css-text-3&batch=text-transform&test=text-transform/text-transform-capitalize-026.html
fwiw, it's true that, currently, the test at [1] does what Unicode
suggests by default, rather than what the CSS spec says. I'm just want
to be sure before i change it.

I'm personally inclined agree with John and Jonathan, that the CSS spec
is right. I assume that these characters are used mainly as symbols,
rather than to form words. I just don't feel i'm sufficiently aware of
all the possible use cases to be sure.

ri
fantasai
2015-03-18 03:15:15 UTC
Permalink
Raw Message
i was wondering about how to treat enclosed alphanumerics when text-align is set to capitalize.
See the test results at http://www.w3.org/International/tests/repo/results/text-transform
wrt uppercase or lowercase transforms, the spec simply says "Puts all letters in lowercase", or vice versa, and that seems to
me appropriate, for those characters that have Unicode mappings. The tests text-transform-upperlower-026.html,
text-transform-upperlower-027.html indicate that this is what happens across all major desktop browsers.
For text-transform: capitalize, however, the spec says "Puts the first *typographic letter unit* of each word in titlecase"
(my emphasis). As you can see in test text-transform-capitalize-031.html, it makes sense when punctuation and the like
precede the actual word of the text to look for the first real letter. (All browsers pass that test.)
it's not clear to me, however, whether a word that only consists of enclosed alphanumerics (which don't fit the definition of
see the results of text-transform-capitalize-026.html. Firefox currently does not. Chrome and Safari, on the other hand do
titlecase per the Unicode data. IE titlecases everything except the first word on the page.
i can't imagine that people will want to do this very often, so this seems much like an edge case, but i thought i'd ask the
question, all the same.
what's the answer?
I think we should go with whatever the Unicode case mapping files
define, and adjust the CSS spec wording to match.

~fantasai
Jonathan Kew
2015-03-23 18:29:39 UTC
Permalink
Raw Message
Post by fantasai
Post by Richard Ishida
i was wondering about how to treat enclosed alphanumerics when
text-align is set to capitalize.
See the test results at
http://www.w3.org/International/tests/repo/results/text-transform
wrt uppercase or lowercase transforms, the spec simply says "Puts all
letters in lowercase", or vice versa, and that seems to
me appropriate, for those characters that have Unicode mappings. The
tests text-transform-upperlower-026.html,
text-transform-upperlower-027.html indicate that this is what happens
across all major desktop browsers.
For text-transform: capitalize, however, the spec says "Puts the first
*typographic letter unit* of each word in titlecase"
(my emphasis). As you can see in test
text-transform-capitalize-031.html, it makes sense when punctuation
and the like
precede the actual word of the text to look for the first real letter.
(All browsers pass that test.)
it's not clear to me, however, whether a word that only consists of
enclosed alphanumerics (which don't fit the definition of
'typgraphic letter unit'), or even one that starts with an enclosed
see the results of text-transform-capitalize-026.html. Firefox
currently does not. Chrome and Safari, on the other hand do
titlecase per the Unicode data. IE titlecases everything except the
first word on the page.
i can't imagine that people will want to do this very often, so this
seems much like an edge case, but i thought i'd ask the
question, all the same.
what's the answer?
I think we should go with whatever the Unicode case mapping files
define, and adjust the CSS spec wording to match.
Sorry to keep beating on this issue, but I'm not sure that really
answers the question here. This isn't primarily about what's in the case
mapping files -- which deal only with individual Unicode characters --
but about identifying the "typographic letter units" to which the case
mapping should be applied.

The current CSS 'capitalize' transform is quite different in that regard
from the toTitlecase(s) function[1] defined by Unicode, which as far as
I can recall is the nearest parallel:

# R3 toTitlecase(X): Find the word boundaries in X according to
# Unicode Standard Annex #29, “Unicode Text Segmentation.” For each
# word boundary, find the first cased character F following the word
# boundary. If F exists, map F to Titlecase_Mapping(F); then map all
# characters C between F and the following word boundary to
# Lowercase_Mapping(C).

In general terms, the key issue here is that toTitlecase applies case
mappings to all the letters of a word (Titlecase to the first, and
Lowercase to the rest), whereas the CSS property applies Titlecase to
the first letter and leaves the rest unchanged. Therefore, given content
such as

Ramsay MacDonald visits the USA

text-transform:capitalize will result in

Ramsay MacDonald Visits The USA

whereas Unicode's toTitlecase() would give

Ramsay Macdonald Visits The Usa

which I don't think is desirable.

Given this difference in approach, I think we should continue to let CSS
Text define exactly what text-transform:capitalize does -- in
particular, which characters it affects -- rather than delegating this
to Unicode.

As Richard points out, the current draft of CSS Text excludes the
enclosed alphanumerics ⓐⓑⓒ etc. from its definition of "typographic
letter units", and therefore they should also be excluded from the
"words" that 'capitalize' affects. IMO, that's the most reasonable
option: these characters are more symbol- or dingbat-like than
letter-like, as reflected in their Unicode General Category of "So". So
I'd like the WG to confirm that this is the correct interpretation of
the spec.

A further issue that I don't think has been mentioned here relates to
the 'uppercase' and 'lowercase' transforms. ISTM that these transforms,
too, should only affect "letters" (or "typographic letter units", as CSS
Text likes to call them) and should leave Symbol characters untouched,
even though some Symbol characters -- by no means all the "enclosed
letter-based" ones -- do have case mappings. The CSS Text draft is less
clear about this, inasmuch as it fails to link the term "letters" in
'uppercase' and 'lowercase' to a definition in the Terminology section
(as earlier drafts did), but the only plausible interpretation I can see
is that "letter" here is shorthand for "typographic letter unit", and so
once again the Symbol characters are excluded.

AFAIK, all engines -- including Gecko, which gets 'capitalize' right by
this interpretation -- currently mishandle this, and apply case mappings
to Symbol characters. However, I doubt that changing our behavior to
match the spec here is likely to "break the Web" in any substantial way,
and it would put us in a more consistent and predictable state. (It
would seem odd that 'text-transform:uppercase' affects ⓐⓑⓒ if
'text-transform:capitalize' does not; or that 'text-transform:lowercase'
affects ⒶⒷⒸ but not 🅐🅑🅒.)

In summary, I think the CSS Text spec should maintain its definition of
these transforms as applying only to letters, and should reinstate its
link to the definition of "[typographic] letter [unit]" for 'uppercase'
and 'lowercase' to reinforce this. An informative note could be added
alerting implementers to the fact that some non-Letter characters have
case mappings defined in Unicode, but should *not* be affected by these
text-transform values.

JK


[1] http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf, page 154.
Richard Ishida
2015-03-24 16:05:43 UTC
Permalink
Raw Message
Post by Jonathan Kew
A further issue that I don't think has been mentioned here relates to
the 'uppercase' and 'lowercase' transforms. ISTM that these transforms,
too, should only affect "letters" (or "typographic letter units", as CSS
Text likes to call them) and should leave Symbol characters untouched,
even though some Symbol characters -- by no means all the "enclosed
letter-based" ones -- do have case mappings. The CSS Text draft is less
clear about this, inasmuch as it fails to link the term "letters" in
'uppercase' and 'lowercase' to a definition in the Terminology section
(as earlier drafts did), but the only plausible interpretation I can see
is that "letter" here is shorthand for "typographic letter unit", and so
once again the Symbol characters are excluded.
AFAIK, all engines -- including Gecko, which gets 'capitalize' right by
this interpretation -- currently mishandle this, and apply case mappings
to Symbol characters. However, I doubt that changing our behavior to
match the spec here is likely to "break the Web" in any substantial way,
and it would put us in a more consistent and predictable state. (It
would seem odd that 'text-transform:uppercase' affects ⓐⓑⓒ if
'text-transform:capitalize' does not; or that 'text-transform:lowercase'
affects ⒶⒷⒸ but not 🅐🅑🅒.)
In summary, I think the CSS Text spec should maintain its definition of
these transforms as applying only to letters, and should reinstate its
link to the definition of "[typographic] letter [unit]" for 'uppercase'
and 'lowercase' to reinforce this. An informative note could be added
alerting implementers to the fact that some non-Letter characters have
case mappings defined in Unicode, but should *not* be affected by these
text-transform values.
actually, i'm inclined to disagree here.

i tend to agree that capitalize should not affect enclosed
alphanumerics, mainly because i can't really imagine use cases where
people will want words made of these things, like Ⓐⓑⓒ, and i expect that
they are more likely to be used as counters, in which case you'd want
them all uppercase or all lowercase, and unaffected by capitalisation.

i can, however, see situations where i might want to convert lowercase
enclosed alphanumeric counters to uppercase, or vice versa, using
styling rather than by changing the actual characters (cf. number forms,
which are similar and behave in the same way)[ⅰ] – and so i think that
the difference in behaviour between capitalize on the one hand and
uppercase/lowercase on the other is ok. Furthermore, i'm worried that
changing the spec may break some usage of these characters (albeit small
in number) in legacy content, given the fact that browsers currently
seem to unanimously agree in their support of upper/lowercase for these
characters.

so in my view, the spec already says what should happen, and we don't
need to change it.


ri




[ⅰ]
http://www.w3.org/International/tests/repo/results/text-transform#upperlower
Jonathan Kew
2015-03-24 16:24:48 UTC
Permalink
Raw Message
Post by Richard Ishida
Post by Jonathan Kew
A further issue that I don't think has been mentioned here relates to
the 'uppercase' and 'lowercase' transforms. ISTM that these transforms,
too, should only affect "letters" (or "typographic letter units", as CSS
Text likes to call them) and should leave Symbol characters untouched,
even though some Symbol characters -- by no means all the "enclosed
letter-based" ones -- do have case mappings. The CSS Text draft is less
clear about this, inasmuch as it fails to link the term "letters" in
'uppercase' and 'lowercase' to a definition in the Terminology section
(as earlier drafts did), but the only plausible interpretation I can see
is that "letter" here is shorthand for "typographic letter unit", and so
once again the Symbol characters are excluded.
AFAIK, all engines -- including Gecko, which gets 'capitalize' right by
this interpretation -- currently mishandle this, and apply case mappings
to Symbol characters. However, I doubt that changing our behavior to
match the spec here is likely to "break the Web" in any substantial way,
and it would put us in a more consistent and predictable state. (It
would seem odd that 'text-transform:uppercase' affects ⓐⓑⓒ if
'text-transform:capitalize' does not; or that 'text-transform:lowercase'
affects ⒶⒷⒸ but not 🅐🅑🅒.)
In summary, I think the CSS Text spec should maintain its definition of
these transforms as applying only to letters, and should reinstate its
link to the definition of "[typographic] letter [unit]" for 'uppercase'
and 'lowercase' to reinforce this. An informative note could be added
alerting implementers to the fact that some non-Letter characters have
case mappings defined in Unicode, but should *not* be affected by these
text-transform values.
actually, i'm inclined to disagree here.
i tend to agree that capitalize should not affect enclosed
alphanumerics, mainly because i can't really imagine use cases where
people will want words made of these things, like Ⓐⓑⓒ, and i expect that
they are more likely to be used as counters, in which case you'd want
them all uppercase or all lowercase, and unaffected by capitalisation.
i can, however, see situations where i might want to convert lowercase
enclosed alphanumeric counters to uppercase, or vice versa, using
styling rather than by changing the actual characters (cf. number forms,
which are similar and behave in the same way)[ⅰ] – and so i think that
the difference in behaviour between capitalize on the one hand and
uppercase/lowercase on the other is ok. Furthermore, i'm worried that
changing the spec may break some usage of these characters (albeit small
in number) in legacy content, given the fact that browsers currently
seem to unanimously agree in their support of upper/lowercase for these
characters.
I could live with that -- I'm not thrilled about the different treatment
of those characters in capitalize vs upper/lowercase, but I don't really
feel strongly about it....
Post by Richard Ishida
so in my view, the spec already says what should happen, and we don't
need to change it.
...but in this case, the spec needs to adjust its definition of
'uppercase' and 'lowercase', which currently says they apply to
"letters". The only definition of "letter" I can find in the spec[1] is
as an alternative for "typographic letter unit", which does not include
Symbol characters.

JK

[1] http://dev.w3.org/csswg/css-text/#characters
fantasai
2016-09-24 17:22:24 UTC
Permalink
Raw Message
i was wondering about how to treat enclosed alphanumerics when text-align is set to capitalize.
See the test results at http://www.w3.org/International/tests/repo/results/text-transform
Based on the discussion that followed, particularly John Hudson
and Jonathan Kew's comments in
http://www.w3.org/mid/***@tiro.com
https://www.w3.org/mid/***@gmail.com
I'm inclined to maintain the case-transform restriction to letters.
Unless i18n objects, I'll confirm with the WG shortly.

~fantasai
fantasai
2018-03-05 06:45:57 UTC
Permalink
Raw Message
Post by fantasai
i was wondering about how to treat enclosed alphanumerics when text-align is set to capitalize.
See the test results at http://www.w3.org/International/tests/repo/results/text-transform
Based on the discussion that followed, particularly John Hudson
and Jonathan Kew's comments in
I'm inclined to maintain the case-transform restriction to letters.
Unless i18n objects, I'll confirm with the WG shortly.
Just to follow-up, this was resolved in
https://lists.w3.org/Archives/Public/www-style/2016Oct/0068.html

Thanks i18n for updating the test!

~fantasai

Loading...