urlsplit does not handle NFKC normalization

URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters. This can result in some characters introducing new segments into a URL.

See Unicode® Technical Standard #46: Unicode IDNA Compatibility Processing.

  • Disclosure date: 2019-03-06 (Python issue bpo-36216 reported)
  • Reported at: 2019-02-16 (email to PSRT)
  • Reported by: Jonathan Birch of Microsoft Corporation and Panayiotis Panayiotou

Fixed In

Python issue

  • Python issue: bpo-36216
  • Creation date: 2019-03-06
  • Reporter: Steve Dower


Python 2.7.x through 2.7.16 and 3.x through 3.7.2 is affected by: Improper Handling of Unicode Encoding (with an incorrect netloc) during NFKC normalization. The impact is: Information disclosure (credentials, cookies, etc. that are cached against a given hostname). The components are: urllib.parse.urlsplit, urllib.parse.urlparse. The attack vector is: A specially crafted URL could be incorrectly parsed to locate cookies or authentication data and send that information to a different host than when parsed correctly.


Timeline using the disclosure date 2019-03-06 as reference: