In HTTP/1.1, RFC 2616, there is a protocol parameter called "Entity Tags" (section 3.11), defined as follows:
entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string
A quoted-string is defined as a string of text parsed as a single word surrounded by double-quotation marks.
Comparing these strings is done in accordance with "Weak and Strong Validators" (section 13.3.3), which defines two types of validators:
- The strong comparison function: in order to be considered equal, both validators MUST be identical in every way, and both MUST NOT be weak.
- The weak comparison function: in order to be considered equal, both validators MUST be identical in every way, but either or both of them MAY be tagged as "weak" without affecting the result.
This is interesting, as what does identical in every way
actually mean? If a client sends something that matches token (i.e., a string without double quotation marks surrounding it), is it equivalent to the quoted-string?
I would logically expect it to be so, however, it is in neither IIS 6.0 nor Apache 2.0. If you argue that it is not, you end up asking yourself whether, e.g., Content-Type: text/plain;charset="UTF-8" is referring to a character set called "UTF-8" (i.e., including the quotation marks) or whether it is referring to one called UTF-8 (i.e., excluding the quotation marks). For compatibility, you must exclude them.
So should we parse the quotes out to find out the value? This would, however, make Etag: W/"a" and Etag: "W/a" equivalent. Are these identical in every way? In this case I'd say no, as in the latter case the W/
is within quotation marks and therefore part of a strong identifier. This also gives the afore mentioned problem with "UTF-8" v. UTF-8. So what are we implementers meant to do? Anyone?
Comments
James Holderness says…
October 29, 2007 06:08:35+00:00
According to the grammar, quotes are REQUIRED around an entity tag (although not around the weak prefix). So an Etag without quotes is technically illegal, and thus it seems reasonable for a server to treat that as different from a legal entity tag (i.e. properly quoted).
As for the Content-Type charset parameter: that's defined as an attribute/value pair which is described in section 3.6. The value can be a token OR a quoted-string, so in that case the quotes are optional.
At least, that's my reading of the spec.
Geoffrey Sneddon says…
October 29, 2007 06:13:50+00:00
While yes, that is mostly rather obvious, the question is really how are we meant to treat quotes in different places: are we meant to keep them from an Etag, but not from a Content-Type?
James Holderness says…
October 29, 2007 06:49:57+00:00
If the quotes are optional (as is the case in Content-Type parameters) then you MUST strip them to get the real value of the parameter (obviously you'd also need to unescape quoted pairs). When they aren't optional, it's less clear what you should do.
An etag of W/"a" is definitely different from "W/a", so if you're going to be stripping quotes, the concept of weakness should be parsed and stored separately. In other words, the first example would be etag=a, weak=true; the second would be etag=W/a, weak=false.
How you deal with an etag that isn't quoted is up to you. It's not a valid etag, so you can't really be wrong. Is an etag of W/a equivalent to W/"a" or "W/a" or neither? If you want compatibility, just do whatever the servers are doing. But you can't really expect the spec to tell you because it's not valid.
Personally I'd be more interested in seeing how servers deal with valid Etags with unexpected quoted pairs. For example is an etag of "A\B" equivalent to "AB"? I believe it should be, but I suspect servers will treat those as different.
Geoffrey Sneddon says…
October 30, 2007 02:44:55+00:00
I finally managed to get an answer out of someone on the HTTPBIS WG mailing list:
None of this is normatively specified anywhere. I never intended this to be about Etag without quotes — the question is whether the quotes are part of the Etag value.