This page enumerates the character entities
HTML provides. These are
tokens between an & (ampersand) and a ; (semicolon), of form &token; and
displayed by an HTML renderer as the symbol described by the token thus
enclosed. Wherever an HTML document uses a character which is not part of the
character set covered by the encoding the document's web server will claim it
uses, it should represent it by a character entity. One can also use
numeric
character entities, consisting of a number (the unicode
code-point for the character) between &# and a ; (semicolon) – however
the verbal character entities (when available) are more intelligible to anyone
reading the page source.
When a web server responds to a request for a page, it reports a (content,
as opposed to transfer) encoding, which specifies how the stream of bytes
delivered should be interpreted as characters. If the page's author used an
authoring tool
which worked in some native encoding
, but the
server doesn't know about this (so doesn't report it, or reports some default at
odds with it) this can confuse the user agent (though, since this not uncommonly
happens, many attempt to guess the actual encoding – but the less we rely
on programs to guess, the less scope they have for bugginess). While a page can
contain meta-data which specifies data equivalent to that in HTTP headers, it is
in principle hopeless (though in practice it may help the user agent's
guess-work) to specify the encoding (and some others, such as MIME content-type)
this way, since the user agent won't correctly read this meta-data unless it's
correctly interpreting the byte-stream as the sequence of characters it's
supposed to understand as telling it how to read the byte-stream – a
classic chicken and egg
problem. Consequently, web pages should be
written in the character encoding the web server will report for them. HTML
character entities provide a way to use characters absent from the encoding
advertised by the web server.
Some authoring tools
will allow the user to switch character sets
(and hence, typically, encodings) without stopping to warn about this issue.
While this provides a convenient way to author
a page using a wide
repertoire of characters, the results are more or less guaranteed to display
unintelligibly to the page's readers. For contrast, if the page uses a
character entity that some browser does not support, it will typically display
the &token; verbatim; this may not look beautiful to the reader, but at
least it won't look like some arbitrary other (i.e. wrong) character. If your
authoring tool
leaves raw characters in web documents, you may find the
demoronizer useful.
It ain't perfect, but I haven't written anything better yet.
The non-experimental parts of this page are derived from the HTML 4 character entity set, which supercedes and subsumes the Web Project's description of the ISO 8859 Latin 1 "ISO 8879:1986//ENTITIES Added Latin 1//EN" character entities. I provide illustrations, so you can see what's what (and whether your browser copes), and shuffled the order. Jukka Korpela provides similar in a table.
In due course this page is due an update to take account of stuff I've been told about unicode; here are pages of charts and names. Ian Hickson's data: URI kitchen can also be useful … and, speaking of Ian, HTML 5 has a much expanded repertoire of character entities, that I should document some day. Some entries from it are included below.
Another update, possibly superseding the preceding: in 2010/April, the W3C MathML WG published its entity definitions for characters, which aims to be fairly comprehensive. It is way too big to assimilate here.
In an attempt to make it easier to find particular characters, I've also broken the list into logical groups:
Aside from the table of greek letters, each entry is of form:
in which the bold punctuation won't be bold in actual entries, portions enclosed in […] aren't always present (and the [ and ] themselves never are),
=, get omitted for characters not covered by iso-accents-list.
Note that raw, numeric and emacs forms are only provided for actual ISO 8859 Latin-1 characters; and that only browsers which claim to support HTML 5 can be complained at for failure to support the rest.
Aside for C programmers: including an ISO Latin-1 ÿ (represented by character code 255) in the text read by a C program is a very simple way to test whether the author was rash enough to collect answers from getc() in char variables – in which case, ÿ looks like an end-of-file marker (or the program fails to spot end of file).
sharps (sz ligature); N and n with tilde; C and c with cedilla; S and s with caron; the Icelandic characters Thorn (first in upper case, then lower) and Eth (likewise).
euro sign
registered, should usually be put in a superscript, e.g. ACME®
trademark, e.g. ACME™, no need to superscript.
single angle quotation mark
Dirac angle bracket,
braand
ket(Jukka says these aren't the proper angle brackets – U+27E8 and U+27E9 should be used instead – but mentions that HTML5 has &[lr]ang; mean these, matching common browser practice.)
curly braces; but ASCII's {m} work fine.
ceiling, left is a.k.a.
apl upstile
floor, left is a.k.a.
apl downstile
horizontal ellipsis,
three dot leader
en dash
em dash
single low-9 quotation mark
double low-9 quotation mark
em space, width of letter m
en space, width of letter n
number space, the width of a digit
punctuation space, the width of narrow punctuation
thin space, useful for grouping digits in long numbers, as in 12 345 678
very thin spacea little too narrow for the previous: 12 345 678
zero width non-joiner
zero width joiner
right-to-left mark
left-to-right mark
bullet
latin small f with hook,
function,
florin
per mille sign, cf. ASCII percent %
prime,
minutes,
feet
double prime,
seconds(of angle),
inches
overline
dagger
double dagger
lozenge,
hollow diamond suit
black spade suit
filled club suit,
shamrock
filled heart suit,
valentine
filled diamond suit
See also floor and ciel enclosures, above; þeoretical physicists should also see the Icelandic letter eth, above; and the section of arrows.
fraction slash; I just use the ASCII solidus, /, as it's easier to type.
minus sign(cf. - the hyphen)
dot operator
invisible timeshere between x and y: x⁢y, but it lives up to its name, so isn't much help to a reader.
asterisk operator, compare the ASCII asterisk: p*q
circled plus,
direct sum
circled times,
vector product
logical and,
wedge
logical or,
vee,
vel
intersection
union
arithmeticrelations
less than or equal to
greater than or equal to
much greater than
much less than
not equal to
identical to
approximately equal to,
isomorphic to
almost equal to,
asymptotic to
proportional to
tilde operator,
varies with,
similar to
is a subset of
is a superset of
is not a subset of
is a subset of or equal to
is a superset of or equal to
is an element of
is not an element of
contains as member
for all
partial differential; see also ð for theoretical physicists
there exists
nabla,
backward difference, the 3-spatial vector differential operator; contrast with Greek Δ
n-ary product,
product signcontrast with Greek Π
n-ary sumation,
sum signcontrast with Greek Σ
square root,
radical sign
integral
double integral
contour integral
double contour integral
anticlockwise integral
anticlockwise contour integral
therefore
though why the idiots couldn't have used &thus; (a
perfectly good synonym for therefore
, with the bonus of reading as and
thus
) or &so; (similarly justified) rather than the hideously clumsy
pun
there4 (presumably motivated by the desire for brevity, which
thus
and so
attain better) is beyond me. But, apparently, it's
ISO 8879:1986's fault, not W3C's.
empty set,
null set,
diameter
aleph(the letter of the Hebrew alphabet; ℵ0 denotes the first transfinite cardinal)
infinity
blackletter capital I,
imaginary part
blackletter capital R,
real part symbol
script capital P,
power set,
Weierstrass p
double-struck capital N, which denotes the set of natural (counting) numbers.
angle
up tack,
is orthogonal to,
perpendicular
upper caseform; thus ψ and Ψ produce the small and capital forms of psi, for example.
name | small | big |
---|---|---|
alpha | α | Α |
beta | β | Β |
gamma | γ | Γ |
delta | δ | Δ |
epsilon | ε | Ε |
zeta | ζ | Ζ |
eta | η | Η |
theta | θ | Θ |
iota | ι | Ι |
kappa | κ | Κ |
lambda | λ | Λ |
mu | μ | Μ |
nu | ν | Ν |
xi | ξ | Ξ |
omicron | ο | Ο |
pi | π | Π |
rho | ρ | Ρ |
sigma | σ | Σ |
tau | τ | Τ |
upsilon | υ | Υ |
phi | φ | Φ |
chi | χ | Χ |
psi | ψ | Ψ |
omega | ω | Ω |
The HTML standard also blesses the following lower-case greek letter variants (with no upper-case versions):
pomega)
Some browsers (e.g. grail) support variants for some lower-case
letters, named by adding a v
to the end of the usual letter's name: the
cases I know of are ϑ for theta (cf. thetasym) and ς for sigma;
but other browsers don't support these so a prudent author will abstain from
using them. See also µ, µ,
and ß, ß, which are very like μ
and β, respectively.
There may be more mathematical symbols in a table in W3.org's tour of HTML 3. See also W3.org's table of HTML MATH mode symbols, if it still exists.
There are a few font-styles that have been widely adopted in mathematics to
provide distinct forms of letters that have thereby taken on their own meanings;
thus ℕ and its friends are from an 𝕠𝕡𝕖𝕟
type-face, whose letters can be obtained by putting opf;
after the plain
ASCII letter and an & before, e.g. ℂ, ℕ, ℚ, ℝ and
ℤ. There's also a rather 𝔊𝔬𝔱𝔥𝔦𝔠
type-face,
using suffix fr;
in the same way, but I find it mostly unreadable: for
example, 𝔄 is its A, which I would flatly fail to recognise if I hadn't just
looked it up in the table; compare 𝔘 (which is U). Then there's
a 𝓈𝒸𝓇𝒾𝓅𝓉
font-face, using
suffix scr;
to get 𝒜, ℬ, … 𝒴, 𝒵, 𝒶,
… 𝒾, 𝒿, 𝓀, … 𝓏. It's OK, but I doubt I'll use
it much.
There's also at least some of the hebrew alphabet: ℵ, ℶ, ℷ, …, but that's as far as it seems to get (early 2018).
There are up to eight directions for arrows – each way horizontal or vertical and each of the diagonals between these – and many styles of arrow, albeit not every style has all directions.
shift-tab keysymbol
tab keysymbol
carriage return symbol
hkfor left and right, but replace
arrwith
arhkfor the diagonals. Illustrating with only a few directions for each:
implies
if and only ifor
iff
xprefix extends (lengthens) the arrow; an
nprefix crosses them with a diagonal line (negating their meaning, when they're used in mathematics); on double-stemmed arrows, an
nvprefix crosses the stems perpendicularly, albeit
hArrbecomes
Harrfor this. A suffix
wmakes a both-ways or right arrow wavy; it doesn't work for left arrows, but does for the wavy right arrow. Inserting
obetween the direction letter and
arrgets you an arrow whose head isn't filled in.
approximatearrow)
harpoons, which are arrows with one side of their head missing. These don't do diagonal directions, but the simple ones do go in all horizontal and vertical directions – and we have to specify which half of the head each has:
is in equilibrium with
There's a whole mess of others, such as right angle with downwards zig-zag, ⍼, but this starts to get into unrecognisable gibberish.
I lob experiments in here to see if they work. Some are inspired by TeX, others by wishful thinking and randomness. W3.org doesn't sanction them and I don't necessarily think they're a good idea.
centered dots
And if you think I should have done all that with tables (I admit I've been tempted – much of it is crying out for it; and I've succumbed for the Greek alphabet), please pause to consider that even under Lynx I can use this list-form to find the right code to type into a file I'm writing. If I did it with tables, it would look a total mess under browsers that don't support them, which would greatly diminish its utility. Meanwhile the present form works fine in, Arena, Mozilla, Grail – all free (as in liberty) – as well as the proprietary (but gratis) Opera and (gratis once you've paid for the operating system they want you to use) IE.
Back in 2002, I suggested to the W3C's CSS folk that maybe it'd be a good idea for style sheet mechanisms to provide for mapping style-sheet-defined &…; tokens to official ones. To illustrate why this would be useful, consider:
wedgecharacter, an inverted v, in at least two ways; as
logical andand as the
antisymmetric productoperator of linear algebra (where it has, again, two meanings: one specific to three-dimensional space, combining vectors to yield a vector; the other applicable to a broad class of tensors; but I'll ignore this ambiguity for the moment).
Similarly, one might wish to have &tensor; map to ⊗, &union; to
∪, ⇔ to ⇔ and so on, enabling mnemonic
names even for
character entities with only one (orthodox) reading. When co-opting some
unicode character to serve some particular purpose, it would likewise make sense
to give it a mnemonic name, indicative of that purpose, thereby abstracting away
the choice of particular unicode character selected to denote it.
It turns out this can be achieved in various ways.
br.and br.wedge { content: "∧"; } br.unite { content: "∪"; } …Downsides: requires you to use the relevant raw unicode character in your style sheet; and, officially, content only applies to the :before and :after pseudo-elements.
HTMLelements, <wedge> and similar; use a style sheet as above to specify their contents. Same downsides as above, but it's mal-formed HTML instead of abusing real HTML.
<?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [ <!ENTITY wedge "∧"> <!ENTITY unite "∪"> <!-- … --> ]>Downsides: only works for XML (including XHTML) and I haven't yet found a way to put the entity definitions in a separate file and import them to each document: so each document has to duplicate the whole mess.
I still think, fundamentally, that the semantic web would be better
served by scrapping the whole ghastly mess of character entities in the DTDs and
replacing them with a style-sheet-based approach (or an approach similar to that
taken by style sheets). The W3C could perfectly readily provide a standard set
of style sheets specifying the present entities (and browsers could still have
these built in) for backwards compatibility, but authors would be enabled to
provide domain-specific semantic names for the characters they're using and
@import
the default specs. Doing it via style-sheets is more
compatible with existing infrastructure than doing it via DTDs and, in any case,
what we're doing here is specifying presentation for character
entities, so it belongs in style sheets.