InsertUTF8TextByUsingAnHTMLwindow

HowTo.InsertUTF8TextByUsingAnHTMLwindow History

Hide minor edits - Show changes to markup

June 21, 2008, at 11:25 UTC by Hans-Jörg Bibiko - keystroke ` only works with US keyboards
Changed lines 91-93 from:

will give the focus back to TextMate's frontmost document.

The next problem by using Mac OSX 10.4.x is that the entire AppleScript code must be encoded in MacRoman. This is done by using iconv.

to:

will give the focus back to TextMate's frontmost document. But this only works with keyboards based on a US layout. If one wants to generalize it one can use:

function insert(str) {
	cmd  = "open 'txmt://open?';echo -en ";
	cmd += "'tell app \"TextMate\" to insert ";
	cmd += "«data utf8" + encode_utf8(str) + "» as Unicode text'";
	cmd += "| iconv -f UTF-8 -t MACROMAN | osascript --  &";
	TextMate.system(cmd, null);
}

The only side-effect by using open 'txmt://open?' is that TextMate will beep caused by the lack of the line parameter.

The next problem under Mac OSX 10.4.x is that the entire AppleScript code must be encoded in MacRoman. This is done by using iconv.

June 20, 2008, at 21:43 UTC by Hans-Jörg Bibiko - Tiger needs as Unicode text
Changed line 49 from:
	cmd += "«data utf8" + encode_utf8(str) + "»\n";
to:
	cmd += "«data utf8" + encode_utf8(str) + "» as Unicode text\n";
Changed lines 108-109 from:

which must be save in MacRoman. If not one has to use iconv before piping it to osascript. But be aware of calling this shell script via TextMate.system. It will decompose accented characters.

to:

which must be save in MacRoman. If not one has to use iconv before piping it to osascript. But be aware of calling this shell script via TextMate.system. It will decompose accented characters.

June 20, 2008, at 11:07 UTC by Hans-Jörg Bibiko - typo
Changed line 14 from:
  • inserting text with '\n' (a new lines)
to:
  • inserting text with '\n' (new lines)
June 20, 2008, at 10:55 UTC by Hans-Jörg Bibiko - typo
Changed line 2 from:
  1. How to insert an UTF-8 encode text by using an HTML output window
to:
  1. How to insert an UTF-8 encoded text by using an HTML output window
June 20, 2008, at 10:54 UTC by Hans-Jörg Bibiko -
June 20, 2008, at 10:54 UTC by Hans-Jörg Bibiko -
Added lines 1-109:

(:markdown:)

  1. How to insert an UTF-8 encode text by using an HTML output window

(:markdownend:)

_by Hans-Jörg Bibiko_

The task is to write a tmCommand which provides the user with a kind of a Toolbar for inserting text snippets into the frontmost document. Such a Toolbar can be implemented as a tmCommand which displays an HTML window. To insert a text the HTML window makes usage of TextMate's JavaScript-Bridge TextMate.system() which calls the AppleScript command:

tell application "TextMate" to insert "TEXT".

By doing so the following problems arise:

  • inserting UTF-8 text (compatibility between Mac OSX 10.4.x and Mac OSX 10.5.x)
  • inserting text with '\n' (a new lines)
  • avoiding the decomposition of accented characters by using TextMate.system() (for instance: 'ü' will be inserted as 'u' plus combining '¨')

To solve all the above mentioned problems the following approach can be used [_written as a Bash shell script_]:

cat<<-HTML
<html>
<head>
<script type="text/javascript" charset="utf-8">
function encode_utf8(raw) {
	raw = raw.replace(/\r\n/g,"\n");
	var utftext = "";
	for(var n=0; n<raw.length; n++)
	{
		var c=raw.charCodeAt(n);
		if (c==10)
		utftext += "0a";
		else if (c<128)
		utftext += c.toString(16);
		else if((c>127) && (c<2048)) {
			utftext += ((c>>6)|192).toString(16);
			utftext += ((c&63)|128).toString(16);
		}
		else {
			utftext +=((c>>12)|224).toString(16);
			utftext += (((c>>6)&63)|128).toString(16);
			utftext += ((c&63)|128).toString(16);
		}
	}
	return utftext;
}	
function insert(str) {
	cmd  = "echo -en ";
	cmd += "'tell app \"TextMate\" to insert ";
	cmd += "«data utf8" + encode_utf8(str) + "»\n";
	cmd += "tell app \"System Events\" to keystroke \"\`\" using (command down)' ";
	cmd += "| iconv -f UTF-8 -t MACROMAN | osascript --  &";
	TextMate.system(cmd, null);
}
</script>
</head>
<body>
	<button onclick="insert('Übringens:\n私は生徒です。')">INSERT</button>
</body>
</hmtl>
HTML

(:markdown:)

  1. Explanations:

(:markdownend:)

Under Mac OSX 10.5.x one could simply write a JavaScript function caused by the issue that Leopard's AppleScript supports UTF-8 encoded text:

function insert(str) {
	TextMate.system("osascript -e 'tell app \"TextMate\" to insert \"" + str + "\"' &", null);
}

This approach cannot be used on Mac OSX 10.4.x machines (no direct UTF-8 support), and it decomposes accented characters in _str_ ( 'ü' will be inserted as 'u' plus combining '¨').

To be compatible with Mac OSX 10.4.x and to avoid the decomposition one can make usage of AppleScript's «data utf8» hex byte stream class. An example:

TextMate.system("osascript -e 'tell app \"TextMate\" to insert «data utf8c3bc»' &", null);

will insert the letter 'ü' as one code point ('ü' is encode in UTF-8 as 'c3bc').

In other words one needs a function which will convert the UTF-8 encode text into a string containing the representation of the text in UTF-8 hex bytes. This task does the JavaScript function encode_utf8().

The next AppleScript command:

tell app "System Events" to keystroke "\`" using (command down)

will give the focus back to TextMate's frontmost document.

The next problem by using Mac OSX 10.4.x is that the entire AppleScript code must be encoded in MacRoman. This is done by using iconv.

__Very important__ is that the entire shell code __must__ be executed as a background process by suffixing it with &! Otherwise TextMate will freeze.


An other approach to insert an UTF-8 encode text into TextMate would be to outsource the osascript and the hex byte conversion into a shell script:

cat<<-AS | osascript --
tell app "TextMate" to insert «data utf8`cat | \
perl -ne 'print pack "a2", $_ for unpack "(H2)*", $_'`»
AS

which must be save in MacRoman. If not one has to use iconv before piping it to osascript. But be aware of calling this shell script via TextMate.system. It will decompose accented characters.