We’ve set up each table with a unique ID. We should be using those rather than album names to relate our tables together. Otherwise, when we change an album name in one table, we also need to change it in the other. By adding a reference to the unique ID, which never changes and never has to change, we only have to update the album name in one place.
The way to do this is to duplicate our table, with an added field.
create table newSongs select songs.*, albums.id as albumID
from songs left join albums on songs.album=albums.album
Once we’ve verified that the new table has what we want, we can rename the old one to “oldSongs” and the new one to “songs”.
Remember that our new table will contain a bunch of songs with no albumID:
select * from newSongs where albumID is NULL
For real data, we would need to go through these records and find out why, fixing those that can be fixed. Once we do fix it, we could get rid of the “album” column in the newSongs table.
One other thing to fix in newSongs is the ID column: it was not transferred over as auto increment. So go ahead and switch the ID column to auto increment.
First, it tells you “invalid default value for ID”. Switch “Null” to YES and the default value to NULL. Then, you’ll get a different error.
Try the following:
select * from newSongs where id=7185
This is the Time Warp from the Rocky Horror Picture Show. It’s in there twice. How many items do we have like this?
select id, artist, song, album, count(*) as idCount
from newSongs group by id having idCount > 1
Having is a form of where that often works better after group by. If you can use where, however, it is almost always a better choice.
Looks like we have 61 songs that ended up getting duplicated. Why?
It turns out that different artists occasionally release albums of the same name. Tracy Chapman released an album called “Crossroads”; and there was a movie called “Crossroads”. Our select statement couldn’t tell the difference.
One solution would be to check the artist as well as the album when doing the transfer. But while it would solve this problem, it will exacerbate the problem of different spellings: instead of only different spellings for album mattering, different spellings for either album or artist will matter.
These are the sorts of problems you run into when you try to reconcile data from multiple sources. Sometimes you can fix the problem at the source; other times, you’ll need to do the work to reconcile them in MySQL. If it is important data, however, it is almost always worth the time now to fix the data, rather than spending more time later after problems have had a chance to build up.