Just yesterday (as of writing this), Anna's Archive released a new post "Backing up Spotify" where they shared that they managed to backup all of Spotify's music and metadata files in full (~300TB torrent.)
They have an interesting point at the start of their blog post about why they chose to backup music in particular:
But our mission (preserving humanity’s knowledge and culture) doesn’t distinguish among media types. Sometimes an opportunity comes along outside of text. This is such a case.
Obviously people can easily access many music files through Soulseek, BitTorrents, Internet Archive, etc.. but I think that this one entire bulk torrent of all Spotify data will be huge for researchers and data analysts!
To make this data more distributable, they also made an interesting decision to offer these files in 160kbit/s, not as a high-quality lossless audio file (such as what you'd typically find audiophiles sharing music files as) which makes them more accessible and easier to make backups of the torrents themselves (decentralisation ftw!)
I want to highlight some key stats for this data:
Around 256 million track's metadata (approximately 99.9% of Spotify's entire library)
86 million songs (represents 99.6% of all listens on the app.)
In recent years, there has been a surge in AI "music" which has clearly made it to Spotify. I think this graph they shared shows this the clearest:
I'm curious what the metadata looks like to see if it could help with splitting the data into likelihood of being AI generated or not... although they mention this:
The amount of procedurally and AI generated content makes it hard to find what is actually valuable.
which sucks.
These are just my initial thoughts on the project - I'm very curious to see where this data goes and what people do with this. They also shared this small timeline:
[X] Metadata (Dec 2025)
[ ] Music files (releasing in order of popularity)
[ ] Additional file metadata (torrent paths and checksums)
[ ] Album art
[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)
This whole thing makes me curious, I wonder if this will also increase the amount of content available via Soulseek over the coming months? I'm unsure how easy it is to select to only Torrent a select amount of this data (not the entire 300TB).
Also - what's the legal side of this like? I looked around and it seems like Anna is an anonymous person (which makes sense) but can they get in trouble? If they don't necessarily host the files themselves but rather centralise discovery of the distributed files, can they get in as much trouble?
TLDR; I think this is really good overall. I respect what the internet archive is doing, and this seems really similar. We'll see what happens down the line, and I'm curious as to how easy it will be to also back up future music (post July 2025, which this data cuts off at.)
What do you think?