add notes about low-level representation of strings

Fixes #2
This commit is contained in:
Tiziano Zito 2024-08-31 10:39:50 +03:00
parent 8f67903169
commit 8d8bab7242

View file

@ -37,6 +37,7 @@
- UTF8 encoded, flexible width from 1B (byte) to 4B (bytes): 1,112,064 Unicode characters (code points)
- ASCII: 7 bits (fits in one byte), 127 characters ➔ [ASCII table](https://upload.wikimedia.org/wikipedia/commons/2/26/ASCII_Table_%28suitable_for_printing%29.svg)
- [visualization](https://sonarsource.github.io/utf8-visualizer/)
- actually in Python strings (more precisely: unicode objects) are stored in different formats depending on which characters are stored for memory efficiency. Look at the gory details [here](https://docs.python.org/3.14/c-api/unicode.html) ➔ not for the faint-hearted!
- **hexadecimal notation**:
- base16 ➔ '0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f'