Explain storing files to disk with concurrency
Company: Databricks
Role: Software Engineer
Category: Software Engineering Fundamentals
Difficulty: medium
Interview Round: Onsite
You are working on the backend of a server that receives file uploads (for example, images or documents) from clients. You need to design and explain how the server safely writes these files to local disk in a multi-threaded environment.
Answer the question from both a **systems programming** perspective (how data gets to disk at the OS level) and a **Java concurrency** perspective (how to make the component thread-safe and correct under concurrent access).
### Requirements
- Multiple threads or requests may try to store files concurrently (for different paths, and potentially the same path if a client retries).
- Once the server responds "success" to a client, the file should be durably stored (i.e., not lost on a normal crash or restart).
- Partially written or corrupted files should not be visible to readers.
- The design should be reasonably efficient for large files and high concurrency.
### Tasks
1. **Explain how writing a file to disk works at a high level**:
- What happens when an application writes to a file (user space → kernel → disk)?
- What is the role of OS buffers, page cache, and system calls like `fsync`/`fdatasync`?
2. **Design a thread-safe file storage component in Java** that exposes an API such as:
- `storeFile(userId, fileId, InputStream data)`
which writes the file’s content to a specific path on disk.
In your design, cover:
- How you handle multiple threads storing different files concurrently.
- How you handle the case where multiple threads might (intentionally or accidentally) write to the **same** file path.
- How you avoid interleaved writes and corrupted files.
3. **Durability and atomicity**:
- How do you ensure that, after the server returns success, the data is safely on disk (or at least on a journaling filesystem so it is recoverable)?
- How do you prevent partially written files from appearing if a crash occurs mid-write (for example, by using a temporary file and atomic rename)?
4. **Error handling and retries**:
- How do you handle I/O errors (disk full, permission issues, transient failures)?
- How do you make the API idempotent so that if a client retries the upload, you don’t end up with a corrupted or duplicated file?
5. **Performance considerations**:
- How would you use buffering, streaming, and possibly NIO to efficiently handle large files and high concurrency?
- What trade-offs do you make between strict durability (frequent `fsync`) and throughput?
Describe your design and reasoning step-by-step, focusing on correctness, concurrency control, and practical implementation strategies.
Quick Answer: This question evaluates systems programming concepts (OS I/O, page cache, fsync semantics) and Java concurrency mechanisms (thread-safety, synchronization, atomicity) for safely writing files under concurrent access.