It never fails to amaze me how little emphasis is placed on data during a website rebuild/re-launch/migration. Time and time again I see firms jumping to wireframes and design comps before even considering what is going to drive this widget or that amazing image carousel. Functional requirements are poured over, grand visions and sell sheets are created, and jobs are quoted before taking that crucial deep-dive into the data. I’ve even worked on site migrations where the old site had been previously migrated without much thought put to data, resulting in a double-dose of data problems.

Let’s peel the lid off data migration and look at some pitfalls, perils, gotchas, and best practices.

Data Containers

Take a close look at your data containers on old site vs. new. They will most likely vary drastically. Everything has to map to something else, and your site migration is a great time to make previously hard-coded things relational, or make that misused field in the old system make sense in the new system. Consider the following container differences:

Un-stuff fields that contain things that should be separate: What might have been embedded in body text on the old site might be broken out into a multi-table relationship or separate field on the new site (images, file attachments, charts, sidebar text, abstracts/decks). This can get complex but it’s well worth it. On your new site, CONTROL your input to keep things clean. Limit WYSIWYG editors to the basics and tightly control what types of markup you’ll allow on the new site.

Normalize stuff that’s redundant: Take a chance to normalize fields in the new system. For example, if an “article” has an “article type” field with a text value in it, maybe there’s an opportunity to turn that into a relational tie to a taxonomy set in the new system.

Bad Links and Redirects

Your old data could potentially be loaded with bad links, both internal and external. These can be killers when it comes to SEO and usability on your newly launched site.

Create redirects: Figure out what the old content aliases were on the old site and create 301 redirects to the new site. This might be taken care of purely on the data side by using a redirect module of some sort in the new system and populating the module in your conversion, or might involve creating a redirect map at the webserver level of the stack. Don’t forget topical landing pages or popular aggregation pages on the site. Those need redirects too.

Scan for bad links in your old content: Try to identify patterns in your embedded links and either fix them en masse during your migration or remove them. You might write a routine that parses out <A> tags, tests them for an HTTP response, then cleans them accordingly. Your new CMS might even have a link checking module you could run to scrub the data once it comes in.

Keep the old stuff somewhere close by

I always like to stash the original data, in its original form, in the new versions of each data item. This might take the form of a subset of fields called  “Archive data” in your new data containers. Doing this covers your bases for anything you might not have thought of in the initial conversion. At the very LEAST, stash the old system’s primary key/ID somewhere with the new version of each piece of data. That way you can always link back to the old data to pull in something you may have forgotten or update something that did not come across right the first time. Clearly you don’t want to double the size of your new database if possible, but at least keep some link from old to new.

Figure out what you want to do with your images

I touched on this a bit in the “data containers” section but it’s worth a closer look. Getting images right on your new site is key

Rip out embedded images: If image tags were embedded in your body text by years of reckless WYSIWYG use, they have to come out and be properly related to each content piece. 

Figure out what image to use for what on the new site: If you have multiple images per content piece, you should figure out some way to determine which one is the “key” image, if applicable, for various scenarios. For example, what image will be used for a thumbnail vs. a homepage carousel/slider vs. an article view? Perhaps one is more impactful or appropriate in one usage vs. another.

Take a look at aspect ratio and image size in your site design: You’ll often see site design comps with perfectly square thumbnails alongside abstracts for, say, a list of articles. What happens when a hugely vertical or horizontally biased image goes into that square? Say, an image of a crane or skyscraper? Apply the same thinking to anywhere else you’ll see images in the new site. Then think of your images on a responsive version or mobile version of the site. Your design and migration should consider this stuff in order to get this crucial visual component right. What happens when a piece of content has no images but is included in a slideshow or article list with thumbnails? The design should handle these scenarios gracefully.

Get your resizing right in the new site: If your new CMS does automatic image resizing, be sure to add resampling actions to ensure the image formats are consistent and lightweight. Consider adding fixed canvases behind all images in fixed design areas to ensure that they don’t get skewed in their new containers. 

Rename your image files if possible and don’t forget alt data: Now’s the time to rename those image files programmatically if possible and add that alt text! The world is watching and making your images easy to find could mean a ton more traffic to your site from organic image searches.

Taxonomy, Taxonomy, Taxonomy

No, that’s not keyword stuffing. Taxonomy is so important that it should be mentioned at least thrice. Your old site’s taxonomy, especially if it’s a really old site, is probably a nightmare. What innocently started off as 10 pure, topical, meaningful terms in 1999 has grown into some 1000+ term list that contains all sorts of stuff- maybe awards lists, article types, outdated terms, redundant terms, people’s names, ad targeting tags for a system that no longer exists, stuff entered by that one intern back in 2003, and who knows what else. This can’t be carried over to the new site. Taxonomy, if done properly, should act as the backbone of your site’s content organization. It has to be done right and now’s the time to clean it up.

Consolidate: Map redundant terms into one term wherever possible. Take overly granular terms and map them to more general terms.

Strategize: Think about your new site’s structure. Think about what you’re actually using taxonomy for. Are you using it to determine what areas of the site something should show up in? Are you using it to relate content to other content? Are you using it to target or drive advertising? Think about it. Hard. Then...

Create a “dream” taxonomy set that will drive the new site: Think about the set or sets of terms that will drive the new site. Will your new vocabulary be multi-tier? If so, is it that way because your content will be organized based on these tiers OR because it’ll make it easier for editors to flag the content? Then...

Map the taxonomy: Map old terms to new terms and map content previously tied to old terms to new terms.


Don’t forget about your old site’s files. PDFs, PPTs, uploaded videos, etc. Did some WYSIWYG editor on the old site put some files in a crazy folder deep down inside the web server? Are other files in some other totally disjoint file folder? Perhaps you have files stored in the database as BLOB data? Don’t forget about links to PDFs and other file uploads. Sift through any body text and try to sniff out links to files so that you can handle them in a uniform manner in the new system.

Character Encoding, copy cleansing, etc.

Are your line breaks MS Windows Style (\r\n) or Unix Style (\n)? Is your character encoding different from one database to another? Are line breaks coming in as HTML markup OR did your old system or WYSIWYG auto-convert non-printable characters to markup?

How clean is your copy? Do old HTML formatting tags and SPAN tags need to be removed or made consistent? How about old embedded tables and divs? Now’s the time to clean this stuff out. How about (gasp) embedded PHP or JS or video players? Social media widgets? Don’t let old markup break your new site. No amount of design work or planning can account for a terrible user experience caused by nasty stuff like this.

See how much there is behind a solid, successful data migration? The more time and effort you spend on this phase of your next migration, the better your site launch is going to be!

Blog Author ID: