Word Count in Oriental Languages

Today you’ll learn about the standards and peculiarities of the word count in oriental languages. I made my mind to write about them separately, since they differ from others greatly.

Chinese. Writing unit in Chinese is hieroglyph. The main difficulty for word count is that hieroglyphs are not separated with spaces. This means that Chinese sentence «这是鸟» (This is a bird – 3 words) is counted like a single word, in case the word count tool counts words basing on the spaces between words (there was even a related query on the WordPress support page).

But if you think that these 3 hieroglyphs «工业化» are also a separate sentence, then you are wrong, since this is just an “industrialization”. So the most logical method of text volume evaluation in Chinese is character count. E.g. a 1000 word English text translated into Chinese will be 1300-1800 characters long. You may read more about the English->Chinese word count ratio here.

Japanese. Japanese is written in a mixture of three main systems — hieroglyphs and two syllabaries: hiragana and katakana. This makes word count even more complicated than in Chinese. So a usual word count scheme in Japanese is based on characters without spaces, which seems quite logical.

Korean. Modern Korean is written with spaces between words (unlike of Chinese or Japanese). Traditionally, Korean was written in columns from top to bottom, right to left, but is now usually written in rows from left to right, top to bottom. This means that the traditional word count scheme, when a word is counted on a spacing basis can be applied.

Other. The only East Asian language except mentioned above that has no spaces is Thai, so the job estimate is done basing on the character count. The rest languages, including all the Indian languages (Bengali, Gujarati, Marathi, Urdu, Orya, Tamil etc), Indonesian, Farsi, Arabic, Turkish and Hebrew utilize spacing, which means that words can be easily counted with a word count tool.

To sum up. Languages that don’t have spacing and require character count include: Chinese, Japanese and Thai. The rest oriental language utilize spacing and enjoy word count instead of character count.


A.C.T. 05-03-2017, 07:05

Thank you for your great points about the word / character count, and they somewhat coincided with mine! 🙂

I myself had been working as translator in the language pair of EnglishChinese, and am naturally keen on checking up the ratio between the Chinese character count and the English word count. I could suggest with a reasonable level of certainty that 1,000 Chinese characters can be approximately translated into about 600-700 English words, or 1,000 English words be translated into about 1,500-1,700 Chinese characters, variying depending on the natures of the source contents and target writing styles.


