tokenscript
tokenscript --from english --to vectorized-tokens

Write English,
get vectorized-tokens.

the pre-tokeniser tokeniser you never knew you needed

Your LLM already has a tokeniser. Tokenscript is the pre-tokeniser tokeniser — it tokenises your English before your tokeniser tokenises it.

Is this necessary? No. Is it load-bearing in any pipeline? Also no. Will it look great in your next arXiv preprint? Absolutely.

no spam. no product. possibly no tokens.