Apr
9
I need them so infrequently, I forget what different projects are out there:
- py2exe (Windows only)
- cx_freeze (cross-platform)
- py2app (Mac only)
- pyinstaller (cross-platform)
- bbfreeze (cross-platform)
I need them so infrequently, I forget what different projects are out there:
I’ve always liked gettext a lot. Rather than asking you to maintain a database of strings, assigning an id to each, it simply uses the original strings itself as the string id. To me, it’s a classical example of choosing practicality over purity.
The Android localization system, of course, uses the former approach. Each string is a resource with an id, each each language, essentially, has one or more XML files with the proper localized string mapped to each id.
For my apps, I initially used to have only the original English version, and a German translation, those being the only languages I speak, more or less anyway. Now, whenever I added a new English string, or changed an existing one, I immediately updated the German version as well – simply enough.
For A World Of Photo, I decided to ask the community for help with translations into more languages. Clearly, things were not so simple anymore.
See, with gettext, when the set of strings an app uses changes as part of a new version, you can simply “merge” the new string catalog into each of the translations. Strings that have been removed from the app are removed from the translations files, new strings are added, and strings that have been changed are flagged as “fuzzy”, at least to the extend that the merge tool detects it as a change, rather than a completely new string. That last part is possible because each translation file contains contains not only the translations, but also the original string that was translated. Remember, it’s the string that is the database key.
As a result, translators simply have to go through the list of new or fuzzy, update those, and they’re done.
Now, Android’s system has no equivalent tools. Frankly, I wonder how other people do this. I mean, you surely don’t want have your localization team go through the full list of strings every time you release a new version. Even if you decide you don’t need to ability to detect strings that have changed (you could simply have a policy of using a new id when such a change is necessary), you still need tools to merge changes in your main strings.xml file into each language’s XML resource with new/removed strings (do any such tools exist?).
I suppose you could also ask have your translators work off a diff, but that seems inconvenient. There’s this huge ecosystem around gettext with all kinds of desktop and web apps that could be utilized.
Google seems to use something internally, because Android’s own string resources are marked with msgid= attributes.
So, I decided the best way for me to deal with this would be to simply convert Android’s XML resources to gettext, do the translations, then import the result back to Android. I found out that the OpenIntents project was doing the same, essentially using a generic xml2po tool found somewhere in the depths of gnome-doc-utils. I kinda got it to work, but ran into a lot of little issues; in the end it felt just too hacky.
The final thing that convinced me that writing a special purpose tool might be worth my while was the fact that Android’s XML resource format has a bunch of different escaping rules and peculiarities (which I plan to write a separate post on), with which translators shouldn’t really have to deal with.
So, have a look at android2po. You can install via PyPi:
easy_install android2po
There’s also a README file which explains the basic usage; which is really just a2po init, a2po export and a2po import calls, though at this point there’s also various configuration options that should make it really quite flexible.
The biggest thing it doesn’t support yet are the <plurals> tags, mainly because I didn’t need them myself yet. Apart from that, I do believe it should work just fine for most projects.
When you get “fatal: Empty path component found in input” errors from git fast-import, check that your export tool doesn’t write out path values that start with a slash. In my case, my rule file for svn-all-fast-export matched paths like “/project/trunk”, when I should’ve used “/project/trunk/” (note the trailing slash).
Pro-Tip for svn-all-fast-export: Use –metadata=no to get rid of the svn info in the generated git commits. It’s not really advertised as an option.
Since my post timezones in MySQL turned out to be so useful (I keep checking it out every other month), I thought it would be time well spent if I jotted down some notes about another area that sends me googling every time I run into it: Encodings in MySQL.
[One] Text columns in MySQL are annotated with the encoding their data is supposed to be in. If a column doesn’t specify an encoding, a default can be given on the table, database and server levels.
MySQL will use it’s knowledge about the encoding of a column to make sure that data getting in an out is properly transcoded to whatever encoding the client is using. This is determined by variables:
So to recap the process, the MySQL client sends a sequence of bytes to the server, the server considers those bytes to use whatever “character_set_client” is set to, will convert the data to the encoding that the target column declares to use, and will then store the result again as an encoded byte string.
[Two] To change the encoding of a complete table, you can use:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name [COLLATE collation_name]; ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Or modify a single column only:
ALTER TABLE tbl_name MODIFY column1 CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Note that in both cases, MySQL will automatically convert the data stored from the old to the new encoding. This is usually what you want, except when you have one particular problem:
[Three] The data actually stored in the database uses a different encoding than the one that is declared in the column meta data, or worse, the stored data has no valid encoding at all.
You see, it’s quite easy to get MySQL to invalidly encoded data. For example, if you declare a column as “utf8″, set the character_set_client setting to “latin1″, and then send data in “utf8″, MySQL will apply a Latin1ToUtf8() transcoding function to the utf8 source data before storing it. So effectively, your text has now been encoded in utf8 twice.
Since the default for charset_set_client is usually latin1, all you have to do is pipe an UTF8-encoded SQL dump (one that doesn’t set the proper charset variables) into the mysql command line client, and you have a mess.
On the bright side, you can fix those issues directly through MySQL as well. If you have a simple encoding mismatch, it may be enough to simply change the charset declaration of the columns to match the encoding the data actually uses:
ALTER TABLE tbl_name MODIFY column1 VARCHAR(100) CHARACTER SET binary; ALTER TABLE tbl_name MODIFY column1 VARCHAR(100) CHARACTER SET utf8;
If you naively apply a ALTER TABLE with the wanted charset, MySQL will automatically transcode the data based on what it incorrectly thinks the column’s charset is. So the trick here is to use two steps, and convert to “binary” first, which essentially amounts to “no encoding”. As a result, MySQL won’t touch the data. It simply drops the encoding annotation in the first statement, and sets a new one in the second.
If the data in the table is actually incorrectly encoded, you can fix this using CONVERT().
First, you may want to investigate what actually is wrong, i.e. what data exactly is stored as opposed to what you like to see stored. You can determine the actual bytes, avoiding any charset conversion by MySQL, using:
SELECT HEX(column1) FROM tbl_name;
Now take for example the case above. Say the column is declared as UTF8, but the UTF8 data we sent was incorrectly passed through a Latin1ToUtf8() conversion. The following statement will then reverse the effect:
UPDATE tbl_name SET
column1=CONVERT(CONVERT(CONVERT(column1 USING latin1) USING binary) USING utf8)
CONVERT transcodes the text from whatever encoding MySQL thinks the data is currently in to the encoding given by USING. We use the same binary-trick as before. MySQL thinks the data in column1 is UTF8, so the innermost CONVERT will apply a Utf8ToLatin1() transcoding, reversing the Latin1ToUtf8() function that should never have happened in the first place. However, MySQL now thinks the result is in latin1. If we were to just save that into an UTF8 column, it would be converted back right away. So we first drop the charset annotation by switching to binary, and then we set the charset to utf8, which should now match what the data actually contains. If you wonder whether that last step can be omitted – yes, I believe so. We could just write the data returned by the second CONVERT call directly, it should have the same effect.
The MySQL documentation also has a bunch of info an charsets and converting.
While working on the Twisted server for A World Of Photo, I quickly began missing the convenience of having it automatically restart during development when I had made changes to the code. It turns out that the autoreload module that Django uses is actually pretty generic [1]. One thing Twisted doesn’t like is that the code which checks for file changes is run inside the main thread, and the actual app in a separate thread. That’s easily reversed though. You can find a patched version on bitbucket.
Then, all you need is a simple twistd wrapper:
from twisted.scripts import twistd from pyutils import autoreload autoreload.main(twistd.run)
Clearly, somebody needs to write a django-treebeard that uses the django-easy-tree API design and django-mptt’s signal approach.
A while ago, Django’s testing framework got transaction-based rollback, which obviously did wonders in terms of test performance. One thing that still bothered me though was the slow, initial table setup. For example, in a modestly sized project of mine with about 40 tables, this would take up to almost a minute. In particular when writing new tests, which is going to be an iterative process, that’s really not acceptable.
Now, one obvious things to do is using an in-memory SQLite database for testing purposes. I’ve tried that at times, but ultimately, various MySQL-specific stuff and raw SQL queries always made this an unsatisfying experience.
I’ve now finally realized that there is an easy solution, and I’m perplexed it didn’t occur to me earlier (maybe Linux, to which I’ve recently switched, just puts these kinds of options closer to one’s grasp). And it really is pretty straightforward: Mount a tmpfs, run a second MySQL instance on a different socket/port using this mount as a data dir, and tell Django to use it.
I’ve put shell script that I’m using on github.
You might want to customize the location of the data directory or the bind options, then simply do:
sudo ./mysqld-ram.sh
and when you’re done, shutdown with Ctrl+C.
The tables which previously took a minute to setup, now only need two and a half seconds. It even cuts the runtime of the actual tests, which were already using transaction-rollback before, in half. Not surprisingly, I notice that my motivation to actually write tests and keep them up-to-date has noticeably improved.
The Vista/Windows 7 Boot Manager data in Boot/BCD is simply a registry hive and can be read using a tool like reged.
It contains stuff like /Description/TreatAsSystem, /Description/GuidCache and a whole bunch of guids under /Objects. Presumably, the actually interesting data is there, but unfortunately, it’s all binary.
A guy named Geoff Chapell has some info on what it all might mean.
On the download site, make sure to select the All Releases and go through the form wizard. Do not use Click here to View all our products – it’ll lead you to a huge, inpenetrable list of possible downloads for some products.
Since this was a source of confusion for me in the past, I like to make it visually obvious when I’m inside an ssh session in gnome-terminal, vs. on the local machine. This is the best solution I have found so far:
ssh-done() {
setterm -term linux -inversescreen off;
}
ssh() {
setterm -term linux -inversescreen on;
/usr/bin/env ssh $*;
ssh-done;
}
The reason why ssh-done is exposed as a separate function is that when ending ssh through Ctrl+C (for example, while at the password prompt), this gives you the ability to manually reset the terminal to normal again.
setterm in theory would also allow you to manually select a foreground and background color, though this didn’t work to well for me; in particular, it broke in various cases when commands tried to colorize their own output.
Totally awesome would be the ability to script gnome-terminal to switch the profile, but this doesn’t seem to exist yet.