HTML Entity Encode / Decode
Encode HTML special characters (<, &, ", ', >) as named or numeric entities, or decode them back to plain text. Auto-detects direction.
<p>Hello & world</p> → <p>Hello & world</p>
Encode HTML special characters (<, &, ", ', >) as named or numeric entities, or decode them back to plain text. Auto-detects direction.
<p>Hello & world</p> → <p>Hello & world</p>
HTML entity encoding replaces special characters that would otherwise be interpreted as HTML markup. < becomes <, & becomes &, and so on. This is required when displaying user-generated content as plain text inside a web page — without it, you've got an XSS vulnerability.
The tool auto-detects: if the input contains entity references, it decodes; otherwise it encodes.
The tool scans each character in the input string and maps it to its corresponding HTML entity as defined in the HTML5 specification. For the five predefined character references — & (&), < (<), > (>), " ("), and ' (') — it uses named entities. For other characters, it falls back to numeric character references: decimal (&#codepoint;) or hexadecimal (odepoint;). The auto-detect logic examines the input for the presence of &; if found, it assumes the input is entity-encoded and attempts to decode by parsing each entity token and converting it back to its Unicode code point. Encoding always favors named entities for the five special characters; for all others, it uses the character's numeric Unicode code point.
Manual find‑replace and language‑specific libraries offer alternative ways to handle HTML entity encoding.
| This tool | Python html module | Notepad++ find & replace | |
|---|---|---|---|
| Ease of use | No installation, works in browser instantly | Requires Python script and shell | Requires launching editor, multi‑step regex |
| Character coverage | Full Unicode via numeric entities | Full Unicode via html.escape/unescape | Only explicit replacements, not automatic |
| Batch processing | Single string at a time | Can process files programmatically | Manual per‑document, no automation |
HTML entities originate from SGML, the parent standard of HTML. The five basic entities — <, >, &, ", and ' — were formalized in RFC 1866 (HTML 2.0, 1995) by Tim Berners‑Lee and Dan Connolly. Numeric character references, based on Unicode code points, were later incorporated in HTML 4.0 (1997) to support international characters. The tool automates the encoding/decoding that developers previously performed manually or via server‑side functions.
If you want to show <div> as text rather than render it, encode the angle brackets first.
Before inserting user-typed content into the DOM, entity-encode to prevent script injection. Modern frameworks do this automatically; raw HTML strings need manual escaping.
Some APIs return entity-encoded HTML in JSON values. Decoding makes them human-readable.
Some chat tools render & in URLs as broken entities. Encoding the share URL once before pasting fixes that edge case.
Rich tooltips and data-attributes that hold HTML need their content entity-encoded so the outer parser doesn't get confused.
Named: <, &, ". Numeric: <, &, ". Both work; named are more readable, numeric work for any character including ones without a named entity.
The five must-encode HTML chars: <, >, &, ", '. Other characters (em dash, copyright) are passed through unchanged — modern HTML handles them as UTF-8.
The tool uses the browser's own HTML parser via a hidden textarea. Whatever the browser decodes is what you get — guaranteed correct for any valid entity reference.
No — different escape sets. URL encoding uses %xx hex; HTML entity encoding uses &name; or &#nn;. Use URL encoding for URL components; HTML encoding for HTML content.