Chinese Pinyin in HTML AsciiPosted: February 23, 2011 | Author: Doug 陀愚 | Filed under: Chinese, Language, Technology | 2 Comments »
A long time ago, I wrote a page about how to correctly display Sanskrit, Pali and Buddhist terms with diacritics in HTML. Compared to older methods, using HTML and extended ASCII codes is a much better, more portable method once you learn the code numbers.
This post was a reference post for myself, to fulfill a certain need, but recently I’ve run into similar problems with Chinese Pinyin. Most academic texts, when expressing Chinese words in English, use the old Wade-Giles method, which I find confusing. When I studied Mandarin in high-school, we used the Pinyin method, which was devised by the People’s Republic of China, specifically a man named Zhōu Yǒuguāng (sounds like “Joe Yo Gwang”), and I prefer the Pinyin system and here’s why:
- The Pinyin system tends to assign unique letters to each sound. Once you know how to read each letter, it’s pretty straightforward and consistent.
- Wade-Giles doesn’t usually have tone marks, but does have confusing apostrophes.
- Pinyin is the official system of the PRC, and thus more useful in today’s global community. They use that for the Olympics for example when mentioning Chinese athletes.
- Mandarin Chinese as a whole is pretty easy to pronounce and read (compared to other dialects, Wu, Yue, Hakka), so you can learn Pinyin very quickly.
So, this post is to show how to express Pinyin using HTML ASCII. The idea here is to allow the correct pinyin to be expressed with the four different tone marks consistently in HTML. Most letters are the same in English, so nothing special there, but the tone marks are usually combined with vowels, and these require HTML ASCII characters to express properly. Full credit goes to this website for listing the entire ASCII table.
All extended-ASCII letters in HTML have the format of:
So, the trick is just remembering what number you want, and fill in the blanks. For ā, you just put in an & a # sign, 257 followed by a ;. For other letters, Pinyin tones and such, just change the number, the rest stays the same.
Pinyin tonemarks in HTML ASCII
This is in order of the vowels in English (a, e, i, o and u), and with the four tones in Mandarin Chinese:
- ā – 257, ‘a’ with the high 1st tone.
- á – 225, ‘a’ with rising 2nd tone.
- ǎ – 462, ‘a’ with the “dipping” 3rd tone.
- à – 224, ‘a’ with falling 4th tone.
- ē – 275, ‘e’ with 1st tone.
- é – 233, ‘e’ with 2nd tone.
- ě – 283, ‘e’ with 3rd tone.
- è – 232, ‘e’ with 4th tone.
- ī – 299, ‘i’ with 1st tone.
- í – 237, ‘i’ with 2nd tone.
- ǐ – 464, ‘i’ with 3rd tone.
- ì – 236, ‘i’ with 4th tone.
- ō – 333, ‘o’ with 1st tone.
- ó – 243, ‘o’ with 2nd tone.
- ǒ – 466, ‘o’ with 3rd tone.
- ò – 242, ‘o’ with 4th tone.
- ū – 363, ‘u’ with 1st tone.
- ú – 250, ‘u’ with 2nd tone.
- ǔ – 468, ‘u’ with 3rd tone.
- ù – 249, ‘u’ with 4th tone.
Note: For capitol letters, simply subtract 1 from the number (275 – 1 for Ē)
One question I can’t answer thoroughly is what to do if a Chinese word has two vowels: which one gets the accent mark? From what little I know, the letter ‘a’ tends to have priority over all other vowels, while ‘o’ and ‘e’ have priority over ‘i’ and ‘u’. The Wikipedia article has good examples of Chinese written in Pinyin for reference.