February 14, 2005

Baroque Works

by peterb

In the beginning was Ti Kan, who wrote a little CD player app for X windows called xmcd. Like some other players at the time, it had support for entering disc and track names, and remembering them later. Ti went a step further, though; he provided support in the application to submit track names to a central server, the CD Database, or CDDB. Users could download and install the entire CDDB on their hard drive, which would then allow them to magically get track and disc names for discs that anyone else had entered data for. Later, Ti added support to look up track names on the internet. I was an early xmcd user (in fact, I even distributed a binary version for BSD/OS, a fact that I'd completely forgotten until I googled for it.)

Eventually, Ti looked for a way to monetize the CDDB. I don't blame him. The database eventually ended up in the hands of Gracenote, where it remains to this day. Gracenote licenses an SDK and access to their CDDB to companies that want to include it in their product.

Many computer MP3 players that have support for ripping CDs also support looking up CD track names in the CD Database.

There are two interesting things about the CDDB (and really I mean "the CDDBs", since there are databases other than Gracenote's): first, the data in them is provided by the users, rather than by the music publishers, and second, the database is full of errors.

In the early days of the CDDB, I definitely got a thrill of pleasure in buying a new disc, putting it in the drive, and discovering that I was the first CDDB user to do so. Cool! I'd get to "contribute to the community" by entering disc and track information. As time went on, this of course became a rarer and rarer experience. One aspect of the database that I don't know much about is: what happens if two people send in different track data for the same disc? Which one wins? Many clients allow you to "submit corrections to the CDDB," but it's not clear that anything actually ever happens with these. So of course, there are errors. The information is being provided by end users. We make mistakes.

Now, it's not surprising that there are some errors in the CDDB. There would be errors, albeit perhaps to a lesser degree, even if the publishers were providing the data. But if you listen to a lot of classical music, the experience becomes frustrating on a whole new level. The number of errors in disc and track data are legion compared to what you find in more popular genres, like rock. And the type of errors you see are infinitely more egregious.

There are a few reasons for this.

  • In terms of genre, the CDDB is specific to a ludicrous degree in pop music, but has no ability to classify classical music at all. If I'm categorizing dance music, I can choose between Dance, Electronica, House, Industrial, and about a hundred other categories. If I'm listening to chamber music? "Classical." Opera? "Classical." Baroque? "Classical." Medieval nose-flute? "Classical."
  • Classical music has a little more metadata than pop music. With a rock band, I'm almost always only going to care about the disc name, song titles, and the artist. In classical, it is important to know the composer, and the performer, and often the recording label.
  • God help you if the work you're ripping has more than one disc. Typically, not only will the metadata be entirely wrong, but the metadata for each disc will be entirely wrong in a completely special and unique way. Since probably a greater percentage of classical works (notably opera) are multiple discs, I see this more often.

For example, I just ripped the Chœur des Musiciens du Louvre version of Offenbach's La Belle Hélène to iTunes. The first disc, amazingly, only had one problem (listing the "year" as 1864, the year of the composition, rather than 2001, the year of recording). To only have one error is actually a rare enough occurrence that I'm considering declaring today a National Holiday and commemorating it each year. The second disc had the following problems: The title of the opera was wrong. Apart from not having the accents, the title of the opera includes "Disc 2" in it, even though there's a "disc number" field in the ID3 tags. There's no Composer tag. The "this is part of a compilation" field is checked. And finally, the piece de resistance is that every song on the disc is called simply "Act II" or "Act III" and the "artist" field is used for the actual title of the track. That's going to make "browse by artist" on my iPod super useful, since it will let me find every song by that superb band, "Ciel! Mon Mari!"

Truly, the mind reels.

Now, all of these issues are correctable in my music player, of course. I can just edit the hideously wrong CDDB data after the fact. But since the whole idea of keeping a centralized database of disk-to-metadata mappings is to free us from having to do that,I find it somewhat frustrating. ("Use a different player that uses a non-Gracenote CDDB" isn't a solution, both because I like everything else about my music player and because I haven't seen any evidence that the non-Gracenote CDDBs are free from this issue.)

I'm not sure what the right solution is, but the marginal cost for paying an intern to send the right data to one or more of the various CDDBs can't be more than a few bucks per project.

What do you think the right answer is?

Posted by peterb at February 14, 2005 08:30 PM | Bookmark This

I've started using Musicbrainz.org to tag everything after I rip it. It has its own set of problems and I disagree with some of their stylistic choices, but at least they're aiming for consistency. And there's the ability for bad data to actually be corrected, unlike CDDB.

I can't say how good the support is for classical music, but at least my tags are a whole lot cleaner now. I no longer have to remember whether a song is under, say "Nick Cave", "Nick Cave & the Bad Seeds", or "Nick Cave and the Bad Seeds" since they automated that process for me.

Posted by Adam Rixey at February 14, 2005 11:25 PM

I think you're fucked. I should switch to musicbrainz, though, but at this point I've been using iTunes, and CDDB has been doing ok, so I'm lazy.

I'm actually more demanding than you if I'm going to pay for a solution; I want to have something recognize the song digitally, and send me track data (songprint/tuneprint sort of thing), but I have these bootlegs to throw everything off...

Posted by Derrick at February 15, 2005 01:57 AM

Derrick - iTunes works with MusicBrainz. It can correct information you've got in iTunes. However: At least for my songs, it is *much* worse than iTunes.

Classical music is mislabeled or not recognized. Many of the non-mainstream tunes I have is not in MusicBrainz. In short, it's about as useful or as useless as iTunes. Yes, I could use it and correct data as applicable - but by that time, I might as well type things in myself.

I think this is one of the big leverage points the music industry would have if they were smart - they *produce* those thingies, they might as well sell the metadata.

This is one of the classic cases where the web just plain breaks, and it's the classic dotBomb fallacy - "if we accumulate enough information, it sooner or later must become correct".

Posted by Robert 'Groby' Blum at February 16, 2005 09:53 AM

Meanwhile, I always wondered how my computer knew what song was up...it's the little things that make me wonder if I know anything at all.

Posted by Julie Y. at February 16, 2005 09:10 PM

Can anyone help me?

I use itunes, yhave a cablew internet connection, usually works fine but recently I have not been able to connect to the CDDB with iTunes or update my Spysubract definitions either. Whats up? please write me if you can help, at rrfan77@aol.com.

I'm on:
Comcast Cable

Running norton internet security

Posted by Rich at May 23, 2005 09:26 PM

Please help support Tea Leaves by visiting our sponsors.

November October September August July June May April March February January

December November October September August July June May April March February January

December November October September August July June May April March February January