Improved CDDB?

Posted by: Phoenix42

Improved CDDB? - 08/09/2004 19:21

I came across this thread over on the EAC forums, and it does make for an interesting concept.

Every time a CD is ripped/inserted/used with EAC, EAC stores the FreeDB info for it in a file called CDDB.dat. If the user edits the information this corrected/updated info is stored instead. So slowly over time EAC builds up a database of the users CDs with the correct info so that if they should insert the same CD again it will have the correct info.

Now I have no idea why some one would be ripping their again unless it is because they want to rip them to different encoder - that is the only advantage I can see to this. Except for what is mentioned in that thread link above. You see it is possible to import someone else’s cddb.dat file into your and gain more info about CDs you don't have, and should you purchase them in the future your bigger cddb.dat file might have the details.

So why do this?
Well something I'm trying todo is to completely automate the ripping process, which is not that difficult todo. The clincher is the correct naming & tag info, 80% correct is only as good as 20% correct when you have to go back and double check everything.
We all know how good CDDB & FreeDB are, good enough most of the time to point you in the right direction but there is still the need for user input to correct things.
But if you knew that the info was 100% correct every time, you could just sit there feeding in CDs to your hearts content. Automation!

So what I'm asking for is do people think that this is a worth while endeavor?
It would take a bit of work on an individual bases, if someone could write some flexibly tools to be able to access the cddb.dat file for editing it would make it even easier. Especially if it could auto correct "Beetles, The" entries and included Roger's (right person?) proper case function that he wrote in Perl (going on a bad memory here).

Thoughts?
Posted by: tfabris

Re: Improved CDDB? - 08/09/2004 19:34

The whole point of CDDB and FreeDB is that they can be updated and corrected over time. You are supposed to be able to send corrections to them. So I'm not sure what you're asking to do that's different?

And by the way, the best automation of this task that I've ever seen is the Rio Central. Stick the disc in the drive and that's all.
Posted by: The Central Guy

Re: Improved CDDB? - 08/09/2004 19:42

Since I started out as a Central owner first, the Central's way of ripping against the internal CDDB was what I was first exposed to. I liked their idea of including the CDDB database on the hard drive and only doing Internet lookups on very obscure discs or new ones that were newer than the CDDB.

Well, after having the Central now for a couple of years plus, I've had a lot of time to use it and rip CDs with it.

I've made quite a few corrections to the tags provided by the on-board copy of CDDB and the web lookups. No problem there, I expect to have to make a few changes in capitalization, spelling, etc.

But it would sure be nice to get an updated CDDB file set. I'm finding now that most of my CD rips are going out to be looked up on the Internet, because most of them are newer titles and they are too new to have been included in the 2001 copy of the Central CDDB....

Oh well, a wish list item I guess....Any chance of an updated CDDB for the Central? I've tried using Gracenote's website and making that inquiry twice, and haven't received any response from them...

Randy
Posted by: tfabris

Re: Improved CDDB? - 08/09/2004 19:49

You could remove the CDDB database from the Central altogether, thus forcing it to do internet lookups each and every time it rips a CD. I don't know which file it is, though. Anyone?

Of course, this only guarantees you get the latest data from gracenote, it doesn't guarantee the accuracy of that data.
Posted by: The Central Guy

Re: Improved CDDB? - 08/09/2004 19:55

I'm actually interested in updating the CDDB file to help prevent Internet lookups and that extra few seconds. The accuracy of the supplied CDDB data is fine, no problem with that.

Since many of my rips are newer CDs, they just aren't included in the supplied CDDB.

I'm not in front of my home PC right now, but at one time I had it figured out which 2 files were related to the CDDB.

I was hoping to get updated copies of the files when installing the 1.10 Central software, but I was unable to get anywhere with Gracenote. I didn't try Rio Support, just figured that there was no support available...

I was thinking that maybe I could try to hook up with someone that has one of the Gracenote-featured music servers (the "other" brands) and see what their CDDB files / dates looked like...

Randy
Posted by: Phoenix42

Re: Improved CDDB? - 08/09/2004 20:49

Correct Tony, one can send the corrected data to them, but that seems to result in several entries for the one CD rather then one corrected entry.
I just think that it can be done better, mainly because I have a use for a better version.
Posted by: tfabris

Re: Improved CDDB? - 08/09/2004 20:55

So you're talking about writing something completely new that will compete with freeDB and CDDB? Please give more details of what you envision. And most importantly, how are you going to correct for the human factor?
Posted by: msaeger

Re: Improved CDDB? - 08/09/2004 21:55

The problem I have had with the central ripping / CDDB is that many times the year isn't filled in and the genre is not what I would have picked.
Posted by: gbeer

Re: Improved CDDB? - 08/09/2004 23:23

Quote:
The problem I have had with the central ripping / CDDB is that many times the year isn't filled in and the genre is not what I would have picked.


And thus the nail is hit squarely on the head!
Posted by: siberia37

Re: Improved CDDB? - 09/09/2004 13:03

I used to download the entire CDDB database to my hard drive and use it to query when I just had a modem and no broadband. Maybe you could use a similar idea here- download the CDDB set- query all the CDs you want to rip, then take out the "bogus" duplicate queries for a CD that are always there and/or fix ones that are wrong. Once your done everything else should be compeltly automatable. Not sure how you are going to avoid manual work any other way though.
Posted by: Phoenix42

Re: Improved CDDB? - 09/09/2004 23:27

The why can be found here and the how here

EAC can be automated from the command line, and a script handles the extra work.
So at the point I have the ability to convert a stack of CDs with the press of a button, but with out definitely correct tags.

I've considered a few different ways about this, manually editing them after the fact (time consuming, error prone), Music Brainz music ID thumb printing (23% failure rate), a custom application using a different DB (time to develop & costs).

So currently I'm back to pursuing getting the tags right from the get go.
So here is what I envision:
EAC create a local database of CDs that it has seen.
The info in this local database is only as accurate as that local user (the human factor).
This database can be imported by other users into their copy of EAC and merged with their local database.
And slowly the database grows.

Like CDDB & FreeDB it depends on a community of users and their collective accuracy, and this is where I'm trying to reduce the human factor. I hoping if the empeg community would help me achieve this, if collectively the community could agree on a standard, if collectively the community could go through their CDs, query them against FreeDB and then correct the entries.

And slowly a perfected local cddb.dat file would be created, and obviously I stand to gain a lot more out of this then any other individual would, I would be getting an important asset for my business. So if the community helped me out with this then I would owe the community, and how I would return the favour to the community? I don't know, I'll let the community decide that.

If anyone can come up with another solution to this problem I'm all ears, I am currently in the process of putting my money where my mouth is and going ahead with this. How well it will fly I don't know, but I'm willing to try it out and see.

Oh, and I know that even then it would not be perfect, as Matt and Glenn have rightly pointed out [orange]the genre is not what I would have picked
, all I can hope is that an agreed standard can settle some of this and at least produce better then the current option.
Posted by: wfaulk

Re: Improved CDDB? - 09/09/2004 23:55

The thing is that it comes down to trusting the people updating the database. What you're trying to accomplish is getting a good set of users by making the buy-in cost fairly high (having to import and export a file manually) so as to not let every no-spelling idiot out there click on the submit button.

This, IMO, is not a good solution. The great thing about FreeDB is that it is going to have almost everything you can think of in it, even if some or much of it is slightly wrong. You want to trade that universality off for correctness, which is probably less useful. What you really need is a large userbase, but a technical way to prevent errors.

As it turns out, I have such an idea. The problem with most of the errors in FreeDB is that people simply submit unchecked info. This usually results in typos. Typos are infrequently duplicated. That is, there is only one correct way to spell "Pink Floyd", but there are many ways to misspell it. If, instead of assuming that the first person who enters the data for a title is correct, you gather a few data submissions first, it should be easy for a program to accept the first duplicated set of data. Since it's more likely that two different people will submit the same correct data than submitting identical incorrect data, this ought to work pretty well.

That being said, if you plan to make money off of this idea, I demand a piece of the action. That is, this idea is mine and I reserve the right to make money from it. (He says as if that has some sort of legal binding.)
Posted by: gbeer

Re: Improved CDDB? - 10/09/2004 00:24

Quote:
That being said, if you plan to make money off of this idea, I demand a piece of the action. That is, this idea is mine and I reserve the right to make money from it.


Too late, I think the big A already has a software patent for that.
Posted by: djc

Re: Improved CDDB? - 10/09/2004 11:30

As others have pointed out, pursuing a "correct" version of tagging data is a very personal proposition. I make some very deliberate choices when I tag my music that others may not agree with. Some examples:

- On two-disc sets, I do not treat them as separate discs, instead I merge them into one big album. The tracks from the second disc are renumbered accordingly (two discs with twelve tracks each become one big album with tracks numbered 1-24).

- On greatest hits compilations, I change the year for each track on the album to match the year that track was released.

- All artists are tagged as they are read, so "The Beatles" are tagged as such. Some people choose to change this to "Beatles, The" or just "Beatles" for easier searching. I also do not put last names before first, so "Peter Gabriel" is tagged just so.

...and so on. These are all choices that I've made, and that others may disagree with. And that's the fundamental flaw I see with current freedb type services, is that they try to be one-size fits all. If there was just a bit more analysis done to the data, the tags could be adjusted on-the-fly to meet the preferences of the individual user. I think that really is the way to take central tagging to the next level.

--Dan.
Posted by: gbeer

Re: Improved CDDB? - 10/09/2004 19:26

As bad as CDDB is, or isn't, consider the alternative. After all we could go back to keying in everything.
Posted by: djc

Re: Improved CDDB? - 10/09/2004 19:28

I agree. But if someone was going to put some thought into an alternative, those are the things I'd be working on improving.

--Dan.
Posted by: time

Re: Improved CDDB? - 16/09/2004 21:43

We should work together a to create an tag aggregator plug-in which pulls the user tag info out of jEmplode and sends the users tag info to a centralized database. It could provide a fairly reliable collection of tag data that could then be pushed back out ala MusicBrainz and help us to unify all our tag info.

This could be a source for fairly high reliability data considering how much this crowd cares about tag info.

What do you think?
Posted by: Phoenix42

Re: Improved CDDB? - 17/09/2004 16:52

Yikes! Dan, your the first person I've met who does that with compilations, but I can understand the logic behind it. Espically when your playing "Hits of the '60s" and all the tracks are tagged 2004

True, the current service are based on an old standard and our expectations have changed since then. A more modern service would have a lot more data stored per CD and the ability to select from several different conventions. Conventions like "Beatles, The", merging multi-disc sets into on long CD. Although genre is a whole new ball of wax!

Bitt I do plan on making money from this, the setup will rip a stack of CDs at the press of a button, or at least try and do so.
It looks like that I will slowly over time have to build up my own cddb.dat file. ie run each CD through EAC to be identified and store the corrected info in cddb.dat, and later run them through the bulk ripper.
While this would be slower initally, at least I would be certain of having the data be to what I want, though I do like Dan's idea for multi-disc sets.

Thanks guys.
Posted by: tfabris

Re: Improved CDDB? - 17/09/2004 17:15

Quote:
Yikes! Dan, your the first person I've met who does that with compilations, but I can understand the logic behind it.


I do that, too. It's even mentioned in the FAQ.
Posted by: JBjorgen

Re: Improved CDDB? - 20/09/2004 12:08

I do that too. So I can do playlists by decade.
Posted by: Daria

Re: Improved CDDB? - 20/09/2004 12:13

Quote:
- On greatest hits compilations, I change the year for each track on the album to match the year that track was released.



I want this, but that's a lot of lookups. If i could script it, I would, but there's not really a good way to do either the lookups or the tag info changes.
Posted by: mschrag

Re: Improved CDDB? - 20/09/2004 12:16

Quote:
We should work together a to create an tag aggregator plug-in which pulls the user tag info out of jEmplode and sends the users tag info to a centralized database.

I've been tempted to write this as well ... The first rev could just be a global RID database. RID is a little picky, but it would be a good start. I actually ported the MusicBrainz TRM generator to Java as well, but you have to have the server to be able to corrolate the ID's among eachother.
Posted by: peter

Re: Improved CDDB? - 20/09/2004 12:25

Quote:
I actually ported the MusicBrainz TRM generator to Java as well, but you have to have the server to be able to corrolate the ID's among eachother.

Yes, this seems a bit crap. In fact you can't even generate a fingerprint without talking to their server, which has obvious privacy issues, let alone issues for disconnected devices such as portables. If there's a good, standalone, open, audio fingerprint generator, I haven't heard of it.

Peter
Posted by: mschrag

Re: Improved CDDB? - 20/09/2004 12:34

Quote:
If there's a good, standalone, open, audio fingerprint generator, I haven't heard of it.

I haven't either ... Before RID's came along I went on a quest to find an open audio fingerprint and MB was the closest I could find to "open", which is a long way off. It's really unfortunate -- there are so many cool applications that could be written with a good fingerprint generator.
Posted by: Daria

Re: Improved CDDB? - 20/09/2004 12:36

Does it have to be "good"?

http://sourceforge.net/projects/freetantrum/
Posted by: Roger

Re: Improved CDDB? - 20/09/2004 12:53

Quote:
- On greatest hits compilations, I change the year for each track on the album to match the year that track was released.


I try to do this. I find that the Guinness World Records: British Hit Singles and Albums book is extremely useful for this.
Posted by: Daria

Re: Improved CDDB? - 20/09/2004 12:55

Quote:
Quote:
- On greatest hits compilations, I change the year for each track on the album to match the year that track was released.


I try to do this. I find that the Guinness World Records: British Hit Singles and Albums book is extremely useful for this.


Manual lookups. How quaint.
Posted by: Roger

Re: Improved CDDB? - 20/09/2004 13:04

Quote:
Manual lookups. How quaint.


Yeah, but at least I trust the editors of this particular source.
Posted by: Daria

Re: Improved CDDB? - 20/09/2004 13:34

Quote:
Quote:
Manual lookups. How quaint.


Yeah, but at least I trust the editors of this particular source.


That's fair. All of the voluntary sources have problems with data quality.
Posted by: peter

Re: Improved CDDB? - 22/09/2004 12:52

Quote:
Does it have to be "good"?

http://sourceforge.net/projects/freetantrum/

No. But it has to be better than songprint. If I fingerprint a wav and the corresponding MP3, I get signatures with several bytes the same but by no means identical. Either it doesn't work at all, or there's some magic matching software on their (now deceased?) server, and the information's not available to the mortal man.

Peter