December 2008 Archives

Py3k admits, fixes, unicode bad.

| | TrackBacks (0)
Py3k is here, and I didn't even find out about it until the day after release. So it's safe to say I'm not a total python dork. The first bullet point under Text Vs. Data Instead Of Unicode Vs. 8-bit made my day:

Python 3.0 uses the concepts of text and (binary) data instead of Unicode strings and 8-bit strings. All text is Unicode; however encoded Unicode is represented as binary data. The type used to hold text is str, the type used to hold data is bytes. The biggest difference with the 2.x situation is that any attempt to mix text and data in Python 3.0 raises TypeError, whereas if you were to mix Unicode and 8-bit strings in Python 2.x, it would work if the 8-bit string happened to contain only 7-bit (ASCII) bytes, but you would get UnicodeDecodeError if it contained non-ASCII values. This value-specific behavior has caused numerous sad faces over the years. (bold text my emphasis)
Yes, I have had days of work wrecked by this very behavior. It was nice to hear them recognize it.

Recent Tweets