5 items tagged “strings”
2024
Tagged Pointer Strings (2015) (via) Mike Ash digs into a fascinating implementation detail of macOS.
Tagged pointers provide a way to embed a literal value in a pointer reference. Objective-C pointers on macOS are 64 bit, providing plenty of space for representing entire values. If the least significant bit is 1 (the pointer is a 64 bit odd number) then the pointer is "tagged" and represents a value, not a memory reference.
Here's where things get really clever. Storing an integer value up to 60 bits is easy. But what about strings?
There's enough space for three UTF-16 characters, with 12 bits left over. But if the string fits ASCII we can store 7 characters.
Drop everything except a-z A-Z.0-9
and we need 6 bits per character, allowing 10 characters to fit in the pointer.
Apple take this a step further: if the string contains just eilotrm.apdnsIc ufkMShjTRxgC4013
("b" is apparently uncommon enough to be ignored here) they can store 11 characters in that 60 bits!
2019
datasette-jellyfish. I learned about a handy Python library called Jellyfish which implements approximate and phonetic matching of strings—soundex, metaphone, porter stemming, levenshtein distance and more. I’ve built a simple Datasette plugin which wraps the library and makes each of those algorithms available as a SQL function.
String length—Rosetta Code
(via)
Calculating the length of a string is surprisingly difficult once Unicode is involved. Here's a fascinating illustration of how that problem can be attached dozens of different programming languages. From that page: the string "J̲o̲s̲é̲"
("J\x{332}o\x{332}s\x{332}e\x{301}\x{332}"
) has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8.
2007
String types in Python 3. bytes are now immutable (just like the bytestrings they are replacing) and a new mutable buffer type has been introduced.
How should JSON strings be represented in Erlang? Erlang’s poor support for strings makes this a surprisingly tricky question.