Japanese Language, Buddhist Sutras and Ruby Programming

A while back, I talked about my efforts to get a full, liturgical version of the Amitabha Sutra, one of my favorite Buddhist texts, online with both Chinese characters and Japanese-romanized reading. Because the sutra is so long, it is not a matter of copy/pasting and writing HTML yourself. It’s too hard. So, I wrote a Perl script that would parse the romanized text, and put all the HTML tags necessary.

Trouble is, I couldn’t make it parse the Chinese characters because they’re UTF8 encoded, not ASCII text. UTF8 characters can be multiple bytes long, and using simple tools like split() in Perl can cause a single Chinese character to get split into two, unusable, bytes of gibberish. Perl can process Unicode, but it doesn’t come naturally, and I eventually gave up and tried to copy/paste the Chinese characters by hand for a while, but gave up on that too. It was just too long.

But lately, after exploring Python language, I tried to revive this old project, and got much closer. However, Python’s Japanese language text-processing requires modules I couldn’t use on my Linux distribution (Mint Linux), and I decided to try a different language again: Ruby.

Ruby, ironically, was designed by a Japanese developer. It’s designed for English, but still handles UTF-8 a lot more easily, and is a pretty nice language to learn in general. So, after playing on the Web a couple nights, I came up with this amateur script:

# encoding: UTF-8

word = Array.new
file = File.new(ARGV[0], "r")

while(line = file.gets())
word = line.split(//u)
i=0
for i in (0...word.length)
print "<td>#{word[i]}</td>"
i=i+1
if i % 5 == 0 and i % 10 != 0 then
print "<td>&nbsp;</td>"
elsif i % 10 == 0 and i > 0
print "\n"
end
end
end

file.close

If I take output from the Amitabha Sutra text on Wikipedia Japan, copy it into a text file, remove all spaces and unwanted characters, I have a plain-text file with a long, long string of Chinese characters. Using the script above, I could parse that, and add HTML tags around it like so:

<td>等</td><td>法</td><td>其</td><td>土</td><td>衆</td><td>&nbsp;</td><td>生</td><td>聞</td><td>是</td><td>音</td><td>已</td>

Then, it’s just simply copying and pasting each line into the Amitabha Sutra I am writing for the blog! This approach took more work up-front, but saved me weeks, probably months of copying and pasting each character by hand! At some point, I hope to move on to other sutras as well and get them “stamped out” for liturgical use by other people, but first I want to revise the script to get the Chinese and romanized text all organized into HTML correctly the first time. Then it’s a simple copy-paste right into the blog! :)

I haven’t finished copying this one yet, but already I’ve made a lot more progress than before. As my old boss used to say: work smarter, not harder. He was right. :)

Namu Amida Butsu


Happy New Year Everyone!

Hello all,

The New Year in Japan has had some ups and downs. The New Year’s Eve plans mentioned in the last post didn’t happen as the little one came down with the flu on her birthday and spent the next few days with a fever and nausea. Some sagely advice from Dr. Sears, fluids, lots of rest and so on helped her recover just fine though, so she’s watching the Kōhaku music special belatedly with her aunt and mom, both big fans of Johnny’s boy bands. :)

New Year’s Day was quiet too. The little one was still recovering somewhat but we did manage to visit a family friend who is a temple family. In Japan, smaller branch Buddhist temples are often run by a single family which is a somewhat unusual system compared to Buddhist institutions elsewhere. Why this is so is long and complicated, but suffice to say that’s how it is in most cases. My wife’s family works in the funeral business making grave markers, so they are not a “temple family” but know a lot of them. Some families, some people are pretty serious about Buddhism, while some are more in it from a business perspective. It really depends on the individual.

Anyway, a particular family we know runs a Jodo Shinshu temple near their house, so we paid a visit in the afternoon. My little one receive plenty of otoshidama (お年玉) gifts from neighbors and relatives, while we had a good talk in Japanese about American Buddhism and such. It was hard to converse in Japanese on a more difficult subject like that, but I managed just enough.

I haven’t had a chance to upload pictures of their temple yet, but will soon. Also, I don’t have any clear plans yet, as there’s a lot of unknowns, and then I have to return to the US in 5 days. So, the trip so far hasn’t had anything really interesting for readers. I still hope to get in touch with a couple friends, but apologies to readers for not having anything to post yet. Hopefully later in the week things will turn out better.

Later!


Thanks Everyone and Happy Holidays!

Today is Christmas Morning over here, but with my wife and daughter over in Japan, and me working a lot of on-call shifts at work, I am not really doing much. I spent the morning, before my shift, cruising around empty streets in my little Toyota blaring Daft Punk as loud as I could, Starbuck’s coffee in hand. Pretty stupid, but just enjoyed the quiet morning, good coffee and good music. Few people may know this, but I am a terrible driver (I didn’t even get my license until my wife was pregnant and finally had to learn), so I drive only rarely. Think of it as a traditional American-style “Sunday drive” with a few differences. :)

Anyhow, this year has been madness for me. I mean, really really stressful. Between job transitions, 60-hour work weeks in November/December and various certification tests and online courses I studied for (like the JSCC which I eventually dropped for various reasons1), and trying to be a good husband/father, plus blogs and personal writing projects I really managed to burn myself out. So, as I sit at home by myself with nothing to do but playing classic Final Fantasy games on the PlayStation 1 (more on that in a later post), I realized that I am exhausted. I wanted to work on other projects while I had the free time, but just couldn’t bring myself to do it. All I wanted to this week is just sit and space out and play games. I guess I am more tired than I thought I was.

Blog readers here have been a very patient bunch with my ups and downs, rants toward Buddhism, frequent changes in blog schedule and so on. For that, I want to extend a heartfelt “thank you” to everyone who reads this blog.

In spite of all the craziness, I do enjoy writing here and interacting with people. I learn lots of good things from readers, and the encouragement does make a difference during those horrid weeks when I feel overwhelmed. One kind fellow even recommended me to the Blogisattva Awards, which was greatly appreciated though somehow I didn’t “get the memo” until too late. Apologies for not noticing that sooner. :p

So, Tuesday I’ll be off to Japan to spend time wit wife, and celebrate my little girl’s 4th birthday with her extended family there. I’ll be happy to see them again, and also catch up with some friends there too (blog readers, plus coworkers). I don’t know how much writing I will be doing for the next weeks, so things will be crazy. I might have to consider a longer hiatus too, but let’s see how other things turn out.

As for today, I’ve got toilets to clean, fires to put out at work (figuratively speaking) and laundry to fold. Boring is good sometimes. :)

Happy Holidays!

1 I had the option of taking a “leave of absence” and coming back when I was less busy, but I explained to them that I was also dissatisfied with the lack of structure to the course, or of the lack of milestones to measure progress (i.e. “After taking course X I should be able to know A, B and C…”). After a positive discussion on the subject, we agreed to disagree and I opted to just drop the course. I felt that it was not a good investment of my time when other things like the JLPT have a more tangible benefit and means of measuring progress, as did the RHCE Linux certification.


Follow

Get every new post delivered to your Inbox.

Join 147 other followers