Fundamentals
The copy problem: why digital documents multiply, and what that means for security
Paper sits in one place. Digital documents do not. Every edit, every sync, every backup, every preview makes another copy. Security thinking that treats a file as a singular object fails — here's how to think about it instead.
On this page
Paper has a useful property: there is one of it. A contract printed and signed in an office exists in exactly that office, until somebody physically moves it. You can lock it in a drawer and know, with confidence, that it is in the drawer.
Digital documents do not behave this way. A file on your laptop is rarely “a file on your laptop”. It is a file on your laptop, probably also on a sync service, possibly in a backup, possibly in a cache, possibly in a preview generated by a mail server, possibly in the thumbnail index of your file browser, possibly in the operating system’s search index, possibly in the CDN that served it to someone you shared it with, possibly in the mailbox of the person you sent it to, possibly in that person’s phone’s photo roll if they screenshotted it. Whatever password-protection you put on the “original” applies to exactly none of those other copies.
This is the copy problem. It is the single largest difference between reasoning about paper documents and reasoning about digital ones, and security advice that does not account for it will produce plans that fail in surprising ways.
Where copies come from
The first surprise for most readers is how many copies get made without anyone asking. A rough inventory for a single PDF you just saved to your Desktop:
- The original file. One copy.
- The OS search index. Windows Search, macOS Spotlight, and Linux equivalents index the text for fast search. That index contains your document’s words.
- Thumbnails and previews. QuickLook (macOS), Windows Explorer thumbnails, and image preview caches retain compressed representations of the file, often stored unencrypted in the user directory.
- The sync service. If the Desktop is synced (iCloud, OneDrive, Dropbox, Google Drive), within seconds the file is in the provider’s servers, their caches, their backup systems, their content-delivery edges, and on any other device signed into the same account.
- Version history. Your sync service keeps previous versions for a defined retention window. Each “save” produces a new historical copy.
- Backups. Your backup software is waiting for this file. Within hours it is in the backup destination, in backup snapshots, in whatever the backup provider replicates to for durability.
- Application state. If the file was open in an editor (Word, Preview, a PDF reader), the application may have auto-save copies, undo history, recently-opened shortcuts containing the filename and a thumbnail.
- Temp files. Most applications write working copies to operating-system temp directories during editing. Some clean up after themselves, some don’t.
- Clipboard. If you copied anything from the document, that data lives in the system clipboard, which is often synced across your devices via cloud services (Apple Universal Clipboard, Google Chrome clipboard, Microsoft Cloud Clipboard).
- Network-level caches. TLS prevents intermediaries from reading the content, but internal corporate proxies, ISP-level caching of public resources, and CDN edge nodes can retain copies in cases where the traffic isn’t end-to-end encrypted.
Sending that file to someone else multiplies the count again. Every recipient has their own inventory like the one above.
Why this matters for security
The copy problem has concrete implications for the things people actually try to do:
“I’ll just password-protect this file before sharing”
A password-protected PDF is one copy out of the many. The plaintext exists on your computer before you applied the password. It may exist in your email drafts. It definitely exists on the recipient’s computer after they open it. Any copy they save, screenshot, print, or quote preserves the plaintext, with no password. The password protected one copy; the population is still exposed.
(This is why the Password-protected PDFs article is specifically about controlling one copy at one transit step, not about preventing re-sharing after the fact.)
”I deleted the file”
You deleted one copy. The sync service deleted the corresponding copy after syncing the deletion. The version history still has it. The backup has it. The thumbnail cache may have it. The sender’s sent-mail folder still has it. Full secure deletion is a multi-place operation, not a click in one place.
”This is encrypted at rest”
Encryption-at-rest, on most services, applies to the currently-live copy on the currently-managed disks. Backups may or may not use the same keys. CDN edge caches may store decrypted content briefly. Staging and dev environments may have snapshots with weaker encryption. “Encrypted at rest” is a statement about one or a few of the copies, not about the whole population.
”The recipient deleted it after reading”
They deleted it from their main view. Their device’s trash retains it until emptied. Their backup has it. Their email archive has it. Their phone’s photo roll, if they screenshotted a page, has it and is probably syncing to iCloud or Google Photos right now. Their messaging app’s media storage has it. The “ephemeral” property of most messaging apps is a promise about the sender’s and service’s copies, not about the recipient’s surviving copies.
What to do with the copy problem
Accepting the copy problem does not mean giving up. It means that effective security practice is a different shape than the one most people imagine.
Reason about populations, not objects. Before sharing or deleting, list the copies that would be created or would need to be destroyed. You will often find that the operation you thought was simple is actually several operations, some of which you have no control over.
Minimize where possible. Every time a file moves from one system to another, the copy count grows and your control shrinks. The most effective confidentiality practice is not “protecting copies better”; it is “creating fewer copies in the first place”. Use client-side encryption so the copies that are created are ciphertext, not plaintext.
Choose platforms with managed access, when it matters. For the narrow subset of documents where “anyone who ever had a copy” is an unacceptable set, use a platform where the document is never actually delivered — it is viewed through a permissioned reader (Purview, Vanta, specific client portals). These limit recipient-side copy propagation. They don’t eliminate it (screenshots exist), but they shrink it.
Design your retention on purpose. For the documents you do control, pick explicit retention rules. When do version histories prune? When are backups rotated out? When does the sync service fully delete trashed items? These answers are different per provider and worth knowing, especially for personal information and regulated data.
Remember the endpoints. A document’s population on your systems is something you can manage. The recipient’s population is not. When sending sensitive information, the threat model should assume it will persist, somewhere, indefinitely.
A mental shift worth making
Most security advice is written as if you were guarding a single object. The useful shift is to see every file as a cloud — a set of copies that spawn, replicate, and decay across systems you do and don’t control. Security is the discipline of knowing where that cloud extends, shrinking it where you can, and accepting where you can’t.
It’s a less satisfying story than “lock up your valuables”. It’s also the one that matches how digital documents actually work, and the one that leads to plans that survive contact with reality.