Anyway, when I started to look at this spreadsheet in some detail, I noticed a problem. The cell contents displayed as divehi but in the edit bar, i.e. what was actually stored in the cell, they were in western (latin) characters.
The contents of cell A10 are displayed as Dhivehi but input and stored as latin characters.
It's taken a few weeks to fully understand the what, why and how of this, which I think I now do. There are a number of strands to this.
Firstly is the keyboard mapping, that is the correspondance between the characters (and ASCII Codes) physical keyboard and the Dhivehi characters they produce when that language option is selected.
Next, the font we are using, P_Faruma, along with a number of other dhivehi fonts will display the unicode Thaana character set and the ascii character set as the equivalent Thaana characters. A consequence of this is that you can't dispay a latin font based language, such as english with these fonts. Some of the standard Windows TrueType fonts such as Arial, will display only the unicode character set (as far as I can see), so that ASCII 64 is always A, whereas Faruma displays ASCII 64 as the equivalent Thaana character on the dhivehi phonetic keyboard layout (selected through Regional Settings), which is ާ or Aabaafili, the double a sound and unicode character 07A7.
Dhivehi Phonetic (soft) keyboard layout
As I've already noted, this mapping of latin to Thaana characters isn't fixed and depends on the keyboard mapping. My first introduction to this was on a blog by Jawish Hameed, who published the java source for a transcoder he has developed to convert between latin and thaana. When I used his mappings in a VB program to do a similar job, I only got about an 80% match between the latin and Thanna in the spreadsheet data, with quite a few of the filis (vowels) being transposed. My first suspicion was that maybe this was a java thing - perhaps linux keyboards may use a different mapping. Then again it may have nothing to do with the OS but relate to mapping of the latin characters on the particular physical keyboard to the dhivehi ones. For instance my keyboard is a UK QWERTY layout, which for the main alphabet characters will be the sames as US QWERTY but would be different to say a German or French keyboard. The mapping I describe below isvalid for a US/UK QWERTY keyboard and Windows Dhivehi Phonetic (as far as I know!)
At this point I set about documenting the mapping for Dhivehi phonetic on a qwerty keyboard and once I had modified my VB code for this, I got 100% match between the latin based dhivehi displayed in the spreadsheet cells and the true unicode dhivehi. The mapping is shown below. There are a few special characters to deal with. Dhivehi reads right to left and the Dhivehi equivalents of question mark, comma and semi-colon are mirrored, as are open and closed bracket. In common with arabic, numbers run left to right whilst dates are read right to left and so formatted as yyyy mmm dd.
A big thanks must go to Jaa. The code below is mine not his but the initial mapping in his code set me on the right track.
Some other useful resources are:
http://tlt.its.psu.edu/suggestions/international/bylanguage/thaanachart.html
http://en.wikipedia.org/wiki/T%C4%81na
http://unicode.org/charts/PDF/U0780.pdf
The following is good in VB6, VB.net, Access Basic. This version handles thaana strings without any numbers, whose digits will get reversed. I have another version which does a more complete job, though one thing I have yet to crack is coding for /- used in the representation of currency values.
Function ThaanaAsciiToUnicode(ByVal strIn As String) As String
'copyright (C) 2010 Tony Bennett
'You are free to use the code for personal or commercial purposes as long as you retain the copyright notice
'The pheonetic keyboard layout mappings are:
'h' -> '1920','S' -> '1921', 'n' -> '1922', 'r' -> '1923', 'b' -> '1924', 'L' -> '1925',
'k' -> '1926', 'w' -> '1927','v' -> '1928', 'm' -> '1929',
'f' -> '1930', 'd' -> '1931', 't' -> '1932', 'l' -> '1933',
'g' -> '1934', 'N' -> '1935', 's' -> '1936', 'D' -> '1937', 'z' -> '1938', 'T' -> '1939',
'y' -> '1940', 'p' -> '1941', 'j' -> '1942', 'c' -> '1943',
'X' -> '1944', 'H' -> '1945','K' -> '1946', 'J' -> '1947',
'R' -> '1948', 'C' -> '1949', 'B' -> '1950', 'M' -> '1951',
'Y' -> '1952', 'Z' -> '1953', 'W' -> '1954', 'G' -> '1955',
'Q' -> '1956', 'V' -> '1957','a' -> '1958', 'A' -> '1959',
'i' -> '1960', 'I' -> '1961', 'u' -> '1962', 'U' -> '1963',
'e' -> '1964', 'E' -> '1965', 'o' -> '1966', 'O' -> '1967',
'q' -> '1968',
'plus the special characters
',' -> '1548', ';' -> '1563', '?' -> '1567', ')' -> '0041',
'(' -> '0040', 'Q' -> '65010'
'note that ? , ; ( ) are mirrored in thaana
Dim strOut As String
Dim c As String
Dim i As Integer
Dim j As Integer
'Dhivehi Phonetic <-> QWERTY mappings
Const AsciiChars1 = "hSnrbLkwvmfdtlgNsDzTypjcXHKJRCBMYZWGQVaAiIuUeEoOq"
'The special characters
Const AsciiChars2 = ",;?)(Q"
strOut = ""
If Len(strIn) > 0 Then
'check if any chars in unicode.
'if they are, assume this is unicode dhivehi and skip the conversion
For i = 1 To Len(strIn)
If AscW(Left(Trim(strIn), 1)) > 255 Then
AsciiToUnicode = strIn
Exit Function
End If
Next i
For i = Len(strIn) To 1 Step -1
c = Mid$(strIn, i, 1)
'need to do a case senstive instr j = InStr(1, AsciiChars1, c, vbBinaryCompare)
If j > 0 Then
strOut = strOut + ChrW(1919 + j)
Else
j = InStr(1, AsciiChars2, c, vbBinaryCompare)
If j > 0 Then
Select Case j
Case 1 ',
strOut = strOut + ChrW(1548)
Case 2 ';
strOut = strOut + ChrW(1563)
Case 3 '?
strOut = strOut + ChrW(1567)
Case 4 ')
strOut = strOut + ChrW(41)
Case 5 '(
strOut = strOut + ChrW(40)
Case 6 'Q
strOut = strOut + ChrW(65010)
End Select
Else
strOut = strOut + Mid$(strIn, i, 1)
End If
End If
Next i
End If
AsciiToUnicode = strOut
End Function