
DeepSeek Unleashes 3FS: A Groundbreaking Distributed File System for AI Workloads
🤖 AI-Generated ContentClick to learn more about our AI-powered journalism
+Introduction
In the rapidly evolving landscape of artificial intelligence (AI), the demand for high-performance computing resources has skyrocketed. As AI models continue to grow in complexity and size, the need for efficient and scalable storage solutions has become paramount. Enter DeepSeek, a pioneering AI research company that has unveiled a groundbreaking distributed file system called 3FS (Fire-Flyer File System), designed to address the unique challenges of AI training and inference workloads.
Combines the throughput of thousands of SSDs and the network bandwidth of hundreds of storage nodes, enabling applications to access storage resource in a locality-oblivious manner.
The Need for a Distributed File System
Traditional file systems often struggle to keep pace with the demanding requirements of AI workloads, which involve processing vast amounts of data and performing complex computations. Bottlenecks in data access and storage can significantly hinder the performance and scalability of AI models, leading to longer training times and inefficient resource utilization. To address these challenges, DeepSeek recognized the need for a purpose-built distributed file system that could seamlessly integrate with modern hardware and provide a robust foundation for AI applications.
The Architecture of 3FS
3FS is a distributed file system that leverages the power of modern solid-state drives (SSDs) and high-speed RDMA (Remote Direct Memory Access) networks. By combining the throughput of thousands of SSDs and the network bandwidth of hundreds of storage nodes, 3FS enables applications to access storage resources in a locality-oblivious manner, ensuring strong consistency and ease of use.
I can't believe DeepSeek has even revolutionized storage architecture... The last time I was amazed by a network file system was with HDFS and CEPH. But those are disk-oriented distributed file systems. Now, a truly modern SSD and RDMA network-oriented file system has been born!
At the core of 3FS lies a stateless metadata service and a transactional key-value store for file interfaces, enabling strong consistency across its distributed architecture. This design choice simplifies application code and makes it easier to reason about, as highlighted by the following quote from the 3FS repository:
Implements Chain Replication with Apportioned Queries (CRAQ) for strong consistency, making application code simple and easy to reason about.
Performance Benchmarks
The true power of 3FS lies in its ability to deliver exceptional performance for AI workloads. DeepSeek has conducted extensive benchmarks to showcase the capabilities of their distributed file system, and the results are nothing short of impressive.
The final aggregate read throughput reached approximately 6.6 TiB/s with background traffic from training jobs.
In a read stress test conducted on a cluster with 180 storage nodes, 3FS achieved an aggregate read throughput of approximately 6.6 TiB/s, demonstrating its ability to handle large-scale demands efficiently. Additionally, using the GraySort benchmark on a smaller setup, 3FS sorted 110.5 TiB of data in just over 30 minutes, achieving an average throughput of 3.66 TiB/min.
Optimizing Inference with KVCache
While 3FS excels in supporting AI training workloads, it also offers a unique feature called KVCache, designed to optimize inference processes. KVCache caches key and value vectors, significantly reducing redundant computations and enhancing throughput and IOPS (Input/Output Operations Per Second) during inference operations.
The system's KVCache feature further optimizes inference processes by caching key and value vectors, showing significant throughput and IOPS improvements during operations.
Tailored for AI Workloads
One of the key strengths of 3FS is its ability to support a wide range of AI-related tasks, making it a comprehensive solution for complex AI-driven applications. From efficient data preparation and dataloading without the need for prefetching or shuffling, to high-throughput parallel checkpointing, 3FS is designed to streamline the entire AI workflow.
https://preview.redd.it/z4y6jjogflle1.png?width=2479&format=png&auto=webp&s=e9893096fec8021a992a79e242503cd8304dc1f2 I added a diagram showing the difference between Dual Pipe, 1F1B (1 forward 1 backward) and ZB1P (zero bubble pipeline parallelism) Also long day today - Granite, Phi 4 mini etc - I tried converting Phi 4 mini to GGUF, but partial_rotary_factor is causing issues :( In the meantime I fixed 3 tokenizer bugs (wrong EOS, better chat template, pad PAD) and did a dynamic 4bit bitsandbytes quant (for vLLM / HF inference): https://huggingface.co/unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit
Community Reactions and Adoption
The release of 3FS has generated significant excitement within the AI community, with developers and researchers eagerly exploring its potential. Many have praised DeepSeek for their innovative approach and the system's ability to address long-standing challenges in the field.
Reference (Chinese): [幻方力量 | 高速文件系统 3FS](https://www.high-flyer.cn/blog/3fs/)
While the adoption of 3FS is still in its early stages, many AI researchers and developers are actively exploring its integration into their workflows. DeepSeek has also released a lightweight data processing framework called smallpond, built on top of DuckDB and 3FS, further expanding the ecosystem around their distributed file system.
Conclusion
With the introduction of 3FS, DeepSeek has once again demonstrated its commitment to pushing the boundaries of AI technology. By addressing the critical need for a high-performance distributed file system tailored for AI workloads, DeepSeek has provided a robust foundation for the development of more advanced and efficient AI applications. As the AI industry continues to evolve, solutions like 3FS will play a crucial role in enabling researchers and developers to unlock the full potential of this transformative technology.