InsertUTF8TextByUsingAnHTMLwindow

How to insert an UTF-8 encoded text by using an HTML output window

by Hans-Jörg Bibiko

The task is to write a tmCommand which provides the user with a kind of a Toolbar for inserting text snippets into the frontmost document. Such a Toolbar can be implemented as a tmCommand which displays an HTML window. To insert a text the HTML window makes usage of TextMate's JavaScript-Bridge TextMate.system() which calls the AppleScript command:

tell application "TextMate" to insert "TEXT".

By doing so the following problems arise:

  • inserting UTF-8 text (compatibility between Mac OSX 10.4.x and Mac OSX 10.5.x)
  • inserting text with '\n' (new lines)
  • avoiding the decomposition of accented characters by using TextMate.system() (for instance: 'ü' will be inserted as 'u' plus combining '¨')

To solve all the above mentioned problems the following approach can be used [written as a Bash shell script]:

cat<<-HTML
<html>
<head>
<script type="text/javascript" charset="utf-8">
function encode_utf8(raw) {
	raw = raw.replace(/\r\n/g,"\n");
	var utftext = "";
	for(var n=0; n<raw.length; n++)
	{
		var c=raw.charCodeAt(n);
		if (c==10)
		utftext += "0a";
		else if (c<128)
		utftext += c.toString(16);
		else if((c>127) && (c<2048)) {
			utftext += ((c>>6)|192).toString(16);
			utftext += ((c&63)|128).toString(16);
		}
		else {
			utftext +=((c>>12)|224).toString(16);
			utftext += (((c>>6)&63)|128).toString(16);
			utftext += ((c&63)|128).toString(16);
		}
	}
	return utftext;
}	
function insert(str) {
	cmd  = "echo -en ";
	cmd += "'tell app \"TextMate\" to insert ";
	cmd += "«data utf8" + encode_utf8(str) + "» as Unicode text\n";
	cmd += "tell app \"System Events\" to keystroke \"\`\" using (command down)' ";
	cmd += "| iconv -f UTF-8 -t MACROMAN | osascript --  &";
	TextMate.system(cmd, null);
}
</script>
</head>
<body>
	<button onclick="insert('Übringens:\n私は生徒です。')">INSERT</button>
</body>
</hmtl>
HTML

Explanations:

Under Mac OSX 10.5.x one could simply write a JavaScript function caused by the issue that Leopard's AppleScript supports UTF-8 encoded text:

function insert(str) {
	TextMate.system("osascript -e 'tell app \"TextMate\" to insert \"" + str + "\"' &", null);
}

This approach cannot be used on Mac OSX 10.4.x machines (no direct UTF-8 support), and it decomposes accented characters in str ( 'ü' will be inserted as 'u' plus combining '¨').

To be compatible with Mac OSX 10.4.x and to avoid the decomposition one can make usage of AppleScript's «data utf8» hex byte stream class. An example:

TextMate.system("osascript -e 'tell app \"TextMate\" to insert «data utf8c3bc»' &", null);

will insert the letter 'ü' as one code point ('ü' is encode in UTF-8 as 'c3bc').

In other words one needs a function which will convert the UTF-8 encode text into a string containing the representation of the text in UTF-8 hex bytes. This task does the JavaScript function encode_utf8().

The next AppleScript command:

tell app "System Events" to keystroke "\`" using (command down)

will give the focus back to TextMate's frontmost document. But this only works with keyboards based on a US layout. If one wants to generalize it one can use:

function insert(str) {
	cmd  = "open 'txmt://open?';echo -en ";
	cmd += "'tell app \"TextMate\" to insert ";
	cmd += "«data utf8" + encode_utf8(str) + "» as Unicode text'";
	cmd += "| iconv -f UTF-8 -t MACROMAN | osascript --  &";
	TextMate.system(cmd, null);
}

The only side-effect by using open 'txmt://open?' is that TextMate will beep caused by the lack of the line parameter.

The next problem under Mac OSX 10.4.x is that the entire AppleScript code must be encoded in MacRoman. This is done by using iconv.

Very important is that the entire shell code must be executed as a background process by suffixing it with &! Otherwise TextMate will freeze.


An other approach to insert an UTF-8 encode text into TextMate would be to outsource the osascript and the hex byte conversion into a shell script:

cat<<-AS | osascript --
tell app "TextMate" to insert «data utf8`cat | \
perl -ne 'print pack "a2", $_ for unpack "(H2)*", $_'`»
AS

which must be save in MacRoman. If not one has to use iconv before piping it to osascript. But be aware of calling this shell script via TextMate.system. It will decompose accented characters.