Chinese Pinyin in HTML Ascii

A long time ago, I wrote a page about how to correctly display Sanskrit, Pali and Buddhist terms with diacritics in HTML. Compared to older methods, using HTML and extended ASCII codes is a much better, more portable method once you learn the code numbers. :)

This post was a reference post for myself, to fulfill a certain need, but recently I’ve run into similar problems with Chinese Pinyin. Most academic texts, when expressing Chinese words in English, use the old Wade-Giles method, which I find confusing. When I studied Mandarin in high-school, we used the Pinyin method, which was devised by the People’s Republic of China, specifically a man named Zhōu Yǒuguāng (sounds like “Joe Yo Gwang”), and I prefer the Pinyin system and here’s why:

  • The Pinyin system tends to assign unique letters to each sound. Once you know how to read each letter, it’s pretty straightforward and consistent.
  • Wade-Giles doesn’t usually have tone marks, but does have confusing apostrophes.
  • Pinyin is the official system of the PRC, and thus more useful in today’s global community. They use that for the Olympics for example when mentioning Chinese athletes.
  • Mandarin Chinese as a whole is pretty easy to pronounce and read (compared to other dialects, Wu, Yue, Hakka), so you can learn Pinyin very quickly.

So, this post is to show how to express Pinyin using HTML ASCII. The idea here is to allow the correct pinyin to be expressed with the four different tone marks consistently in HTML. Most letters are the same in English, so nothing special there, but the tone marks are usually combined with vowels, and these require HTML ASCII characters to express properly. Full credit goes to this website for listing the entire ASCII table.

Format

All extended-ASCII letters in HTML have the format of:

&#(number);

So, the trick is just remembering what number you want, and fill in the blanks. For ā, you just put in an & a # sign, 257 followed by a ;. For other letters, Pinyin tones and such, just change the number, the rest stays the same.

Pinyin tonemarks in HTML ASCII

This is in order of the vowels in English (a, e, i, o and u), and with the four tones in Mandarin Chinese:

  • ā – 257, ‘a’ with the high 1st tone.
  • á – 225, ‘a’ with rising 2nd tone.
  • ǎ – 462, ‘a’ with the “dipping” 3rd tone.
  • à – 224, ‘a’ with falling 4th tone.
  • ē – 275, ‘e’ with 1st tone.
  • é – 233, ‘e’ with 2nd tone.
  • ě – 283, ‘e’ with 3rd tone.
  • è – 232, ‘e’ with 4th tone.
  • ī – 299, ‘i’ with 1st tone.
  • í – 237, ‘i’ with 2nd tone.
  • ǐ – 464, ‘i’ with 3rd tone.
  • ì – 236, ‘i’ with 4th tone.
  • ō – 333, ‘o’ with 1st tone.
  • ó – 243, ‘o’ with 2nd tone.
  • ǒ – 466, ‘o’ with 3rd tone.
  • ò – 242, ‘o’ with 4th tone.
  • ū – 363, ‘u’ with 1st tone.
  • ú – 250, ‘u’ with 2nd tone.
  • ǔ – 468, ‘u’ with 3rd tone.
  • ù – 249, ‘u’ with 4th tone.

Note: For capitol letters, simply subtract 1 from the number (275 – 1 for Ē)

One question I can’t answer thoroughly is what to do if a Chinese word has two vowels: which one gets the accent mark? From what little I know, the letter ‘a’ tends to have priority over all other vowels, while ‘o’ and ‘e’ have priority over ‘i’ and ‘u’. The Wikipedia article has good examples of Chinese written in Pinyin for reference.


Be the first to like this post.

2 Comments on “Chinese Pinyin in HTML Ascii”

  1. arunlikhati says:

    I’m not an expert, but my sense is that the accent falls on the vowel nucleus. If there are two vowels and the first one is either i or u, then these vowels are considered “glides,” and so the second is considered the nucleus, and thus gets the tone mark. If the first vowel is neither i nor u, but the last vowel is i or u (or o), these last vowels are also considered glides (“off-glides”) and so the first vowel is the nucleus and gets the tone mark. (If both first and last vowels are i or u, i.e. iu or ui, then stick with rule #1.) If there are three vowels, then odds are the first and last vowels are glides, so the middle vowel is the nucleus and gets the accent mark. That’s my sense at least. I made a commitment this year to learn Chinese, though I haven’t quite spent as much time at it as I should!

  2. Doug 陀愚 says:

    Oooh! Very nice explanation Arun! Thank you very much. :) I think you hit it on the head, and articulated it a lot better! :D

    谢谢你!


Leave a Reply

Gravatar
WordPress.com Logo
Twitter picture

You are commenting using your
Twitter account. (Log Out)

Facebook photo

You are commenting using your
Facebook account. (Log Out)

Connecting to %s