Note: This page is part of the archive.

This document is part of the US-CERT website archive. These documents are no longer updated and may contain outdated information. Links may also no longer function. Please contact if you have any questions about the US-CERT website archive.

Ensure that Input Is Properly Canonicalized

Author(s): William L. Fithen Maturity Levels and Audience Indicators: L4  / D/P  SDLC Life Cycles: Implementation  Copyright: Copyright © Carnegie Mellon University 2005-2012.


Failure to canonicalize input can introduce vulnerability. Inadvertently canonicalizing input multiple times can introduce vulnerability.


Canonicalization is the process of transforming a potentially flexible data structure into one that has guaranteed characteristics. It is a frequent technique for input data validation. For example, the same input data "characters" can be encoded in many ways, ranging from 7-bit ASCII to variable-width multibyte Unicode. Before a program that accepts such input uses it, it is frequently required that the input be transformed into some canonical form that is universal (in the context of the program). Otherwise, even simple text comparisons (e.g., length, equal, ordering) cannot be made.

For extensive coverage of this issue see [Howard 02 Chapter 10: All Input Is Evil!].

Failure to Canonicalize (When It Was Needed)

When input with identical semantics can be supplied in multiple syntaxes, then it is usually wise to define one of the syntaxes as "canonical" and transform all of the other representations into that one before using the input. Even better is to disallow all input that is not canonical [Hoglund 04].1

Redundant Canonicalization (Which Is Not Idempotent)

When canonicalization of input is required, be sure that it only occurs once.2 In many representations, it is not safe to canonicalize already canonicalized input [VU#580299].


CitationBibliographic Entry

[Hoglund 04]

Hoglund, Greg & McGraw, Gary. Exploiting Software: How to Break Code. Boston, MA: Addison-Wesley, 2004.

[Howard 02]

Howard, Michael & LeBlanc, David. Writing Secure Code. 2nd. Redmond, WA: Microsoft Press, 2002.


MacInnis, Ken. Vulnerability Note VU#580299: Microsoft Internet Explorer contains URL decoding cross-domain vulnerability. June 14, 2005.