I’m going to try and approximate, using the simplest way possible, an English language sentence. The method I’m going to use is to pick a number, N, and make my selection of words from random strings of at most N letters.
- If N = 2 a sentence would look like this: d fo mh j e l tx df d
- If N = 5 a sentence would look like this: gh e kj jegns tyu dfa o wdu tah ttauo kk
So here’s my question:
If I want to approximate the distribution of word-lengths in the English language, which value of N should I choose?
I know it won’t be a very close approximation, but it’s very quick and easy to generate the words using this set-up.