Technology

Technology

Arabic on the Kindle Voyage

I commute to work each day by bus, about an hour or so each way. Perfect reading time, and my Kindle Voyage is an excellent companion. As I look around at my fellow passengers, I see the majority of them messing about with games on their phones, a few read “real” books, and there are others who also seem to like the Kindle’s paperwhite screen. Two of the ladies that I see on a relatively regular basis read in languages other than English – I haven’t been rude enough to interrupt and ask, but from a distance, it looked like Japanese. Frankly, I’m a bit jealous. So I periodically go digging and try to read some Arabic on my Voyage.

20180311_132838croppedI’ve noted before that the Kindle Voyage appears to be getting close to a good display of Arabic; you have to side load the Arabic documents somehow – I usually use email to my Kindle address. I recently had a set of web articles (book reviews) that I wanted to read, but I prefer to read on the paperwhite screen. Someday I’ll have enough ‘free money’ laying around that I’ll be able to buy an Android tablet with an e-ink screen so I can just install all of the normal apps – including MS Office apps – directly on it and read files natively. Someday. Until then, I do a lot of “send to Kindle” either through email or from my web browser. In this instance, I copied several articles from the ‘net and put them into a single MS Word document and formatted it for printing. Then I thought I should just send it to my Kindle, running version 5.9.4 as of this writing, and see what happens.

20180311_132921At first glance, you probably thought the same thing I did: That looks pretty good! Let me pile on a couple of other “pretty good” things before I start on what makes it painful to really use. If you long-press a word, it highlights it and, if you are connected to wifi, looks for a Wikipedia article. You can also tap the “translation” tab and get an actual lookup (though the dictionary doesn’t work – that is dependent on an actual dictionary being installed, not supported for Arabic yet). You can see here that the translation (for the Arabic word سلطة), for some reason, shows up on top of the text that says “Translating your selection.” If you swipe back to Wiki or the dictionary and then back to translation, the ghost text disappears. They use Bing translator.

Then we get into the nitty-gritty, the stuff that starts to get to you after you’ve been reading for an hour – or sometimes has you puzzling over a word, getting out of your comfy reading chair and going back to the PC to figure it out.

Inked20180311_132838cropped_LIThe first one is the text alignment. Those of you who don’t read Arabic probably don’t care about this post anyway, but if you are reading, I’ll do my best to include you. Arabic is a right-to-left language. That means the text starts on the right hand side of the page and words progress to the left, with most letters connected sort of like cursive handwriting in English (but with some very interesting rules). If you look at this “big picture” carefully, you’ll notice that the text is nice and even down the left hand side of the page, and a bit random down the right hand side. Not that unusual for English documents, but for Arabic, it should be the other way around – or at least make it fully justified and square down both sides. This hints that the Kindle isn’t fully bi-di (bidirectional text) aware. However, when I went to a location with an English word integrated into the text, it was in the right place and did not break up the word order. So the problem likely lies with alignment rather than text direction or flow, and ought to be a relatively easy fix. Of course, I don’t know what the code behind the Kindle display looks like – and I’m not a developer – so it could be enormously complex and I’m an idiot. I’ll accept that.

Second, I’m not a fan of the font. I’ve gone into the display settings and tried all of the available fonts; the spacing between characters and lines changes when you change fonts, but the actual display font does not change. This font is a bit open and loopy and reminds me of Moroccan script. Not as embellished as something handwritten, but some similarities. Let’s take a closer look at some of my nit-picking.

2014The font is dynamic, like all fonts on the Kindle. Certain sizes seem to have some spacing issues. I usually have mine set at a 7, and I found this. The m م (indicating the Gregorian year) should not be under the 2, and the close parenthesis should not be over the 2 either. Both problems go away if I bump up to 8 or down to 6.

Before we go on to another image, let me pick on a couple of things in the التي as well. There is a left hook at the top of the lam that I’m not a big fan of, but the dots of the ya’ touching the bottom of the letter are even more annoying to me.

mawazAnother thing that I’ve noticed is that the diacritical marks – which aren’t often used – turn hideous when they are. Take a look at this tanween – two kasras (for the -in sound) on موازٍ here. It looks more like someone has crossed out the z.

speakIn this next example, you can take a look at how the final ت is connected to the ط.  There is a little loopy shape between the ط and the ت … oh, wait! I just checked the original Word document for this, and it turns out the phrase is فور أن تنطق, and that last letter is not a ت but a ق. Interesting; the ن goes below the line of script a little, why not a ق? That dip below the line is one of the distinguishing characteristics of that letter.

emqueueAnd I guess this answers my next question which was: what letter is connected to the T ت here? None, it is meant to be a ق, and the circle that is crowding it out from the right is an m م, but this is almost impossible to tell because there is essentially no space between the two letters. Before doing all of this digging, and looking back at the document on my PC, I thought the word was ترهت. According to the Word document, this phrase is أن ترمق العالم من علٍ This phrase also gives me the chance to talk about the ayn ع and ghayn غ – in most fonts (including the one you are likely using) a medial ayn/ghayn has a flattish top to distinguish it from the more round fa/qaf ف/ق shapes: ـعـغـ vs. ـفـقـ. In this font – the third word from the left is العالم and the ayn is round and solid and a bit smaller than the fa shape. In this case, I was able to tell what the letter was meant to be, I just don’t care for its design.

KACSTBookIf someone were to ask me, and no one has, I would recommend using an OpenSource font, something available under the GPL that is both traditional and easy to read. This provides for easy scaling and no problems with licensing. One very nice option might be KACSTBook, shown here (using the sampling tool at https://fontlibrary.org).

DubaiI also like the free Dubai font. If the folks at Amazon chose the font they are using because someone likes that more modern and rounded aesthetic, I think the Dubai font does it better and remains extremely readable. Here is a sample, using the same text. Note how all the letters are clear, and the tanween is visible without covering the tail of the lam ل.

I’ve spent a lot of time picking on the font. I sincerely hope Amazon picks something else before Arabic support is “officially” announced, because I think the current font makes certain letter combinations almost impossible to puzzle out – if I hadn’t been able to go back to the Word document on my PC I would never have guessed ترمق for that word!

With a change of font and better alignment options, we’ll have a winner. It would be awesome to have a dictionary for purchase and an Arabic keyboard to do searches, but even without those things, it becomes a very welcome reading tool.

Some Tech Notes

I know almost everything I’ve posted lately has been reblogs, and I’m a bit sorry about that. I’d love to put up more unique content. My life is about to get even busier, so I don’t know if that will change. But today, I’d like to share a couple of tech notes.

First, one of my computers is an older model Dell Venue 8 Pro; it has a 32gb drive and 2gb of RAM and Dell makes a decent active stylus for their tablet systems (the stylus also works with my main computer, an Inspiron 7352!). The Windows 10 Anniversary update came out a couple of weeks ago now, and included some nice updates for stylus-enabled systems. Being more of a geek than is healthy for me, I really wanted to install it on my tablet. The problem is that, even after removing everything but the operating system – and even using some admin tools to strip the OS to bare bones, I was never able to free enough space to get the installer to work. It says it wants 16GB of free space, tough to accomplish when the 32gb drive has 23gb of usable space, and the OS takes up around 8gb. Even when I got to 17gb free the installer failed.

So. I downloaded the installation image creation tool from Microsoft and did a clean Windows 10 install (no Dell preloaded software). This took forever, and the OS I ended up with was not the Anniversary update. After running all of the system updates, I grabbed a fresh copy of the installation image creation tool and did another clean install. This time I got the update! Hooray 🙂

If you are going through this, don’t give up. You can make it work.

Windows 10 image creation tool (use “Download Tool Now”)
Freeing up space in Windows 10: Article 1, Article 2
Dell USB adapter/power adapter – so you can charge the battery & use a keyboard and mouse at the same time. (I got mine on eBay a bit cheaper)

Now, to get a copy of the most recent MS Office and a drawing glove for using OneNote for Sketchnotes.

Arabic writers, why so bold?

Arabic writers, why so bold?

I have a bit of a rant.

First, I would like you to do me a favor and go visit this page: http://goo.gl/lyylAu. It is a technology blog in Arabic. Even if you can’t read it, please just take a look. Now, trot along with me to another page, this one a web forum (nominally dedicated to Harry Potter fandom, but this article is about an interview with John Hurt: http://goo.gl/SEqdjg. And another forum posting here (this time with a light background) about protecting the v-Bulletin admin control panel from hackers: http://goo.gl/4oyNUz

What do you see? The more “professional” technology blog used a few different fonts, used bold text for titles, and it looked… normal. To me, at any rate. The two web forums? Everything is bold.

Why so bold?

I’ve seen this in Arabic on the internet since I first started seeking out Arabic on the internet in around 1996 (back then you had to use Arabic Windows or the Arabic Language Kit on a Mac to see Arabic, now it just works. Thanks Unicode!). For some reason that I do not understand – and I would really like to hear an answer – many Arab writers seem to prefer bold text. I do not. I think it is the Arabic equivalent of TYPING IN ALL CAPS. I FEEL AS THOUGH I AM BEING SHOUTED AT. I don’t care for it.

I was willing to occasionally complain to friends or coworkers and then let it go, but then I started reading In Other Words: A Coursebook on Translation by Mona Baker. The book is very good so far. As I’ve mentioned, I am guiding a book club of like-minded translators through it. We have reached the sample texts in chapter 2:

With the above proviso in mind, we can now look at examples of strategies used by professional translators for dealing with various types of non-equivalence. In each example, the source-language word which represents a translation problem is underlined. The strategy used by the translator is highlighted in bold in both the original translation and the back-translated version. (2011, p. 23)

Sounds good, nice and clean, and easy to follow. In this section, there are examples in Arabic, Spanish, Chinese, Italian, Japanese, Greek and a few others. Rather than jump straight to the punchline, I’ll share a German example from page 33 (as an image):

GermanSample
Looks great! I do not speak German, but I can follow right along. The same goes for Spanish, Greek, and the other Latin-based scripts. But for Arabic… Can you guess?
ArabicSample
Anyone have a guess as to where put is in the Arabic sentence? If you do not read Arabic, here is a hint: the whole Arabic statement is in bold text. For the curious, here it is (I have used a red box to mark put):
ArabicSampleMarked
Now, in the first edition of the book – published in 1992 – typesetting mixed Arabic and English was a bit more difficult than it is today. I could forgive the typewriter-like font and bold-appearing text back then; it was just nice to have Arabic samples in an English book on translation. Now, however, things are a lot easier (thanks again, Unicode!). Anyone with a copy of Microsoft Office, OpenOffice (or its sisters), or InDesign (for real DTP) can properly lay out Arabic text. I am not an expert in Arabic typography or DTP, though I am trying to learn more about both, but I can tell a bold font from one that is not. Add to that that the Arabic used in the cover art is broken… It is disconnected (this is from the web page, but the graphic is the same):

ArabicCover

The Arabic word here is supposed to be لغة, language (لغة for those who prefer bold…).

Oh, and by the way, I do not read Chinese or Japanese, but I think the book has the same problem for both of those languages. Here is an example:

ChineseSample

So, again I ask: why so bold?

 

Why I still don’t use Translation Memory

I recently received an email asking me about an article I wrote in 2005 titled Why I Don’t Use Translation Memory. First, they wanted to know if I had changed my mind, and second, if there was any more I had to say on the topic.

In essence, the answer is a simple yes. The article is still valid. Here are the reasons why I still don’t use translation memory tools:

1- They are expensive! Most TM tools range from $900 – $1500 for the freelance user versions.
2- In my freelance work – which is a relatively small amount, I hold a full-time job – most of the work I get is either hard copy or an image of hard copy. The state of Arabic Optical Character Recognition is such that I would not want to rely on it for production use. If anyone knows of any cheap, reliable OCR, let me know!

Two reasons. That is about all there is to it. Now, I have looked into using open source translation memory tools. Specifically, I have looked closely at OmegaT and Anaphraseus. These products both have a lot of potential, and I will be experimenting with them more in the future. They might change my mind, having taken cost out of the picture. The best part is they both run on Windows, Mac, and Linux/FreeBSD systems.