Compress LLM input without sacrificing accuracy

Save on LLM costs, improve latency and fit more context in your requests by compressing the input with a compression model.

29 tokens

14 tokens

Total saved: 52%

Benchmark results

Tested on LongBench v2, a public long-context benchmark.

66%

fewer tokens

100%

accuracy maintained

Token usage comparison

Without compression100%

With Otsofy34%

230 questions • 50 runs averaged • GPT-4o-miniView detailed results →

Be the first to experience the future of LLM input optimization. We are onboarding new users. Get exclusive early access before the public release.