LinguaShrink: Reducing Token Overhead With Psycholinguistics
LinguaShrink: Reducing Token Overhead With Psycholinguistics
Blog Article
As large language models (LLMs) improve their capabilities in handling complex tasks, the issues of computational cost and efficiency due Serving Tray Liners/Mats to long prompts are becoming increasingly prominent.To accelerate model inference and reduce costs, we propose an innovative prompt compression framework called LinguaShrink.Inspired by the observation that LLM performance depends on the density and position of key information in the input prompts, LinguaShrink leverages psycholinguistic principles and the Ebbinghaus memory curve to achieve task-agnostic prompt compression.This effectively reduces prompt length while preserving essential information.We adopted the training method from OpenChat.
The framework introduces part-of-speech priority compression and data distillation techniques, using smaller models to learn compression targets and employing a KL-regularized reinforcement learning strategy for training.Additionally, we adopt a chunk-based compression algorithm to achieve adjustable compression rates.We evaluate our method on multiple datasets, including LongBench, ZeroScrolls, Arxiv Articles, and a newly constructed novel test set.Experimental results show that LinguaShrink maintains semantic similarity while achieving up to 26 Over Ottoman Table times compression.Compared to existing prompt compression methods, LinguaShrink improves end-to-end latency by 1.
43 times.