-
Notifications
You must be signed in to change notification settings - Fork 719
[css-syntax] custom property names too permissive, require namespacing rules #7129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, custom property names can contain literally any non-ASCII character. If necessary, I'm happy to restrict this, but I question why it would be necessary to do this for property names, but okay for property values to still be fully unrestricted? |
Property names are used in CSS "code" and have to be be parsed, matched, and otherwise referenced. Abusive names can cause spoofing problems (even though the underlying code point sequence is still just some integers and the parser may not care). For example, is Property values are data and can include natural language text (as well as, well, any character data, including junk). While the value space might be limited by applications in different ways, there don't appear to be any requirements to do so here. In fact, your Spec goes out of its way to highlight this fact:
|
Property values are used in CSS just as much as property names, tho. We can't interpret custom property values in the custom property, but we do as soon as they're substituted into a non-custom property, or if the custom property is registered with a grammar. Moreso, tho, the |
Yes, it occurred to me that this might turn out to be a gap in CSS Syntax (which might be a serious "ouch" and difficult to do something about). Since one of the things the property value can contain is a string literal, one probably can't apply UAX31-like rules just generically to the value (i.e. in CSS Variables) |
Right, removing the potential from strings def seems out (at minimum, they def shouldn't be restricted to the Identifier production from Unicode), but I think I didn't make my original point clearly enough - this restriction should apply to all custom identifiers, not just property names, right? |
That's right. |
Okay, I'm gonna retag this to Syntax, then, because we should handle the restriction at that level. |
Agenda+ to discuss restricting the allowed codepoints in an ident sequence (used in keywords, function names, dimension units, selectors, property names, etc). Possible options:
Then there are subsequent questions. First, what should we do about characters so restricted that are used in an identifier sequence?
Second, should we allow escapes to represent the restricted character in identifiers?
|
Since this is now being solved at the CSS Syntax level, which is the correct way to do it, untagging from CSS Variables |
Note that the latter two of the three identifiers in my popular tweet would be invalid under the JS rules:
I'f I'm reading correctly, under the HTML rules the middle one Not that this is required to be supported, just noting the effects. ^_^ |
The CSS Working Group just discussed
The full IRC log of that discussion |
The HTML allowed chars are:
We'd continue to disallow |
@aphillips does that resolution address your concerns? |
Programming languages such as JS and Java that allow non-ASCII variable names with character limits usually have different restrictions for the initial character. Most notably they forbid combining marks. They sometimes exclude other values (such as bidi controls, although those are excluded above). I think using the HTML restrictions is a realistic solution for CSS for the reasons given above. We might need a note about combining mark handling at the start of a token. HTML handles this by requiring an alpha char ( |
Ah, hm, indeed. HTML gets to avoid the first-character problem. CSS does have special rules for the first character of an ident as well, but they're only different in the ASCII range (preventing idents from being number-lookalikes). But if the concern is just about combining characters at the start, that'll be fine mechanically; the CSS parser still just handles codepoints individually and doesn't care about combining in any way. So you could select the class you mention without worry, by putting that combining char after a period. That said, I'm fine with further first-char restrictions if necessary. Unless you request otherwise tho, @aphillips , I'll assume that the existing rules should be fine and use the same non-ASCII allowed characters for both initial and non-initial chars. |
As long as the CSS parser doesn't care, then things should work. Content authors will need to be advised about the spoof/abuse potential when viewing CSS files as text somewhere (you may even already have such a note??) |
Don't have such a note, but I'm look over some of the verbage used elsewhere for that issue and add one. |
…s to the same list that HTML allows in custom element names. #7129
Okay, restriction applied, and I added a significant note to that section. I'll need to add and/or tweak some tests for this. |
CC’ing @markusicu @macchiati (Unicode “Trojan Source” working group) as FYI |
Note that I'm currently linking to a Rust-lang blog post about the trojan source thing; if there's a better "official source" about it from Unicode I'd be happy to switch the reference over. |
Here's Unicode's announcement, which has a link to @macchiati et al's doc about the topic. |
Shouldn't serialize an identifier be modified accordingly? - If the character is not handled by one of the above rules and is greater than or equal to U+0080, is "-" (U+002D) or "_" (U+005F), or is in one of the ranges [0-9] (U+0030 to U+0039), [A-Z] (U+0041 to U+005A), or \[a-z] (U+0061 to U+007A), then the character itself.
+ If the character is not handled by one of the above rules and is a [non-ASCII ident code point](https://drafts.csswg.org/css-syntax-3/#non-ascii-ident-code-point), "-" (U+002D), "_" (U+005F), or is in one of the ranges [0-9] (U+0030 to U+0039), [A-Z] (U+0041 to U+005A), or \[a-z] (U+0061 to U+007A), then the character itself. |
Yes, it should be. |
Is it correct that the ranges are inclusive?
Can we change the current text to :
Update : I forgot I asked this here and asked it again in #8861 (comment) This has been answered and needed edits were made. |
Uh oh!
There was an error while loading. Please reload this page.
Originally raised on CSS Variables, but later discussion concluded the best fix is to change CSS Syntax. Original post below:
https://www.w3.org/TR/css-variables-1/#defining-variables
The above text defines the custom property name as "any valid identifier". Tracing that definition back to CSS Values and thence to
ident token
, we find that the name can contain any Unicode code point > U+0080. This includes various special forms of whitespace as well as potential problem characters, such as bidi controls (such as might cause "Trojan Source" attacks). Namespacing is definitely a complicated problem: the I18N WG doesn't want groups to cherry-pick characters (thereby excluding certain languages from using the feature).Most programming languages attempt to address this by adopting some form of restriction for variable names such as those found in UAX31 Unicode Identifier and Pattern Syntax. In JavaScript, for example, the definition looks like the one found here. CSS should make similar restrictions on property names (values can remain unrestricted).
The text was updated successfully, but these errors were encountered: