diff --git a/architecture/README.md b/architecture/README.md index eefadfc..699660a 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -37,6 +37,7 @@ - UTF8 encoded, flexible width from 1B (byte) to 4B (bytes): 1,112,064 Unicode characters (code points) - ASCII: 7 bits (fits in one byte), 127 characters ➔ [ASCII table](https://upload.wikimedia.org/wikipedia/commons/2/26/ASCII_Table_%28suitable_for_printing%29.svg) - [visualization](https://sonarsource.github.io/utf8-visualizer/) + - actually in Python strings (more precisely: unicode objects) are stored in different formats depending on which characters are stored for memory efficiency. Look at the gory details [here](https://docs.python.org/3.14/c-api/unicode.html) ➔ not for the faint-hearted! - **hexadecimal notation**: - base16 ➔ '0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f'