gsnedders

Holiday? You get no holiday from teh future of teh intarwebs!

HTTP Entity Tags Confusion

Tags: October 29, 2007 (4 comments)

In HTTP/1.1, RFC 2616, there is a protocol parameter called "Entity Tags" (section 3.11), defined as follows:

entity-tag = [ weak ] opaque-tag
weak = "W/"
opaque-tag = quoted-string

A

quoted-string

is defined as a string of text parsed as a single word surrounded by double-quotation marks.

Comparing these strings is done in accordance with "Weak and Strong Validators" (section 13.3.3), which defines two types of validators:

  • The strong comparison function: in order to be considered equal, both validators MUST be identical in every way, and both MUST NOT be weak.
  • The weak comparison function: in order to be considered equal, both validators MUST be identical in every way, but either or both of them MAY be tagged as "weak" without affecting the result.

This is interesting, as what does identical in every way actually mean? If a client sends something that matches

token

(i.e., a string without double quotation marks surrounding it), is it equivalent to the quoted-string?

I would logically expect it to be so, however, it is in neither IIS 6.0 nor Apache 2.0. If you argue that it is not, you end up asking yourself whether, e.g.,

Content-Type: text/plain;charset="UTF-8"

is referring to a character set called

"UTF-8"

(i.e., including the quotation marks) or whether it is referring to one called

UTF-8

(i.e., excluding the quotation marks). For compatibility, you must exclude them.

So should we parse the quotes out to find out the value? This would, however, make

Etag: W/"a"

and

Etag: "W/a"

equivalent. Are these identical in every way? In this case I'd say no, as in the latter case the W/ is within quotation marks and therefore part of a strong identifier. This also gives the afore mentioned problem with

"UTF-8"

v.

UTF-8

. So what are we implementers meant to do? Anyone?

Comments

  1. James Holderness
    says

    Edit commentOctober 29, 2007 06:08:35+00:00

    According to the grammar, quotes are REQUIRED around an entity tag (although not around the weak prefix). So an Etag without quotes is technically illegal, and thus it seems reasonable for a server to treat that as different from a legal entity tag (i.e. properly quoted).

    As for the Content-Type charset parameter: that's defined as an attribute/value pair which is described in section 3.6. The value can be a token OR a quoted-string, so in that case the quotes are optional.

    At least, that's my reading of the spec.

  2. Geoffrey Sneddon
    says

    Edit commentOctober 29, 2007 06:13:50+00:00

    While yes, that is mostly rather obvious, the question is really how are we meant to treat quotes in different places: are we meant to keep them from an Etag, but not from a Content-Type?

  3. James Holderness
    says

    Edit commentOctober 29, 2007 06:49:57+00:00

    If the quotes are optional (as is the case in Content-Type parameters) then you MUST strip them to get the real value of the parameter (obviously you'd also need to unescape quoted pairs). When they aren't optional, it's less clear what you should do.

    An etag of W/"a" is definitely different from "W/a", so if you're going to be stripping quotes, the concept of weakness should be parsed and stored separately. In other words, the first example would be etag=a, weak=true; the second would be etag=W/a, weak=false.

    How you deal with an etag that isn't quoted is up to you. It's not a valid etag, so you can't really be wrong. Is an etag of W/a equivalent to W/"a" or "W/a" or neither? If you want compatibility, just do whatever the servers are doing. But you can't really expect the spec to tell you because it's not valid.

    Personally I'd be more interested in seeing how servers deal with valid Etags with unexpected quoted pairs. For example is an etag of "A\B" equivalent to "AB"? I believe it should be, but I suspect servers will treat those as different.

  4. Geoffrey Sneddon
    says

    Edit commentOctober 30, 2007 02:44:55+00:00

    I finally managed to get an answer out of someone on the HTTPBIS WG mailing list:

    To compare two quoted-string elements you need to dequote them including removing escapes, but in practice it doesn't matter much as people are not usually escaping things within quoted-string unless needed (but sometimes forget when needed, partly due to poor specifications, already fixed).

    This is quite notable in for example Digest authentication where proper handling of quoted-string is required for the hashes to compute properly as they are based on the value as such and not the quoted-string representation. (i.e a login name with " or \ in it..)

    It's in theory also needed for ETag processing, but it's less noticeable as impacts on the protocol of getting this wrong is pretty minimal.

    None of this is normatively specified anywhere. I never intended this to be about Etag without quotes — the question is whether the quotes are part of the Etag value.

Leave a Reply

Comments are (sometimes) moderated.