Size Matters, But How Much Does It Cost?

What is Context vs. Max Output

Context Matters.

pre-ChatGPT : 2,000 tokens context shared between prompt and output

November 2022 - ChatGPT 3.5 : 4,000 tokens context shared, but it could output 2,000 tokens (Tricky to make it do that)

Claude 1.0 in Slack, March 2023 ChatGPT 4.0 launched with 8k context but still 2,000 output

May 2023 - Claude 1.0 announces 100k context

July 2023 Claude 2.0 drops with 4k output

November 2023 - ChatGPT releases 4 Turbo with a 128k context window and 4k output, Claude releases Claude 2.1 and 200k context

So far in 2024: ChatGPT stayed 128k, but added multimodal (video, images, audio) and free for all. Claude released 3.0, which is 3 models, just starting to go multimodal Google has launched 1 million tokens context and a new system of context caching

Claude 3 various models:

Source: https://www.anthropic.com/api

Source: https://www.anthropic.com/api


Cost comparison Tokens/Words (* 750 words to 1,000 tokens conversion used)

Model Price per M Price per K Price per Word* 1,000 words 50,000 words
Haiku (Input) $.25 $.00025 $.00000033 $.00033 $.0167
Haiku (Output) $1.25 $.00125 $.00000167 $.001667 $.0833
Sonnet (Input) $3 $.003 $.000004 $.004 $.20
Sonnet (Output) $15 $.015 $.000020 $.02 $1.00
Opus (Input) $15 $.015 $.000020 $.02 $1.00
Opus (Output) $75 $.075 $.00010 $.10 $5.00

What are Foundational Models?

Moderated. Offered by large companies, usually integrated into other software.