A well-structured taxonomy acts as an organizational backbone on any content-heavy website, so it's something you want to get right and plan for. At SGC we typically use taxonomy to flag a content piece by its related topic(s). For example, take a content piece about "Green Building". By tagging this piece with the taxonomy term "Green Building", we're using that taxonomy term to define, to some degree:

1. Sameness/likeness: This piece is like other pieces based on this same same topic. We might suggest further reading on this topic and pull similar articles that have the same taxonomy term. Certainly the trend of "Green Building" is different than buildings that are green in color so this taxonomy relationship is key in this case for establishing similarity

2. Ad Targeting: We may use Taxonomy in some cases for displaying the correct/appropriate advertising messages to our readers

3. Where stuff shows up on the site: We often use topical aggregation pages to group content by taxonomy term. This gives our readers topical "one-stop-shops" for important industry topics. We might also use taxonomy-filtered results to populate sidebars and blocks throughout the site.

4. What stuff is: we try to use content types rather than taxonomy for this purpose, however, in some cases taxonomy is the only way to flag a content item as one "type" of content or another (news item vs. feature vs. white paper, for example)

However you plan to use or already use taxonomy in your Drupal site, things change over time. Vocabulary sets (these are groupings of taxonomy terms in Drupal) get diluted, mixed up, washed out, and overblown. Remember that intern a couple summers ago that entered all those redundant terms? How about that amazing new module which piggybacked on your topical vocabulary and added a whole bunch of useless terms? Perhaps as time passed, certain terms became less useful to readers or were replaced with newer terms. Maybe you're re-structuring your site, or, rebuiding it and want to do some housecleaning first. You need to get a handle on your taxonomy and clean house.

There is a module for this out there already but Drupal 6 support is "experimental", and, I prefer to get my hands dirty with this stuff by using straight SQL updates. In many cases I'm merging hundreds of terms at a time, so by scripting these merges with SQL I save lots of time as well. The SQL examples I'm giving here are Drupal 6 specific, but the general concepts are applicable to Drupal 7 as well. These mass updates are like taking a 12-foot chainsaw through your data so be sure you have backups. I typically create a copy of each table I'm going to change in my database for super-simple rollbacks if needed.

In a vocab cleanup/merging situation we want to do the following things:

1. Export the current vocab set and content count for each term in order to give the editorial staff a chance to re-map the terms in Excel (or Google Docs, even better)

2. Update the term-to-node relationships from old term to new or existing target term based on the spreadsheet

3. Delete old terms

4. Create "301" redirects from old term to new term for the taxonomy aggregation/display pages

5. Delete the old term aliases

For step 1, I want to give our editorial staff a nice, clean spreadsheet that shows all the terms in a particular vocabulary PLUS a content count for each term. That way, they can see which terms are used the most throughout the site and decide which ones to prune and merge accordingly. I'll add one blank column to the spreadsheet called "destination_id" so that they can put the term ID of the destination term in that column. Once they are done, I'll take their spreadsheet and use it to build my update, delete, and insert queries. So, for step 1, fire up your favorite MySQL interface (I use MySQLWorkbench) and run this query. Export the results, import into Excel or Google Docs or whatever you use for spreadsheets, and get your terms organized.

 

SELECT TD.tid, TD.name, COUNT(TN.nid) AS NbrOfPieces, ' ' AS destination_id FROM
term_data TD LEFT JOIN term_node TN
ON TD.tid = TN.tid
WHERE TD.vid = 1 --(Change value here to match the vocabulary ID you want to use)
GROUP BY TD.tid, TD.name
ORDER BY TD.name
 
This ends "Part 1" of this blog post- stay tuned for Part 2 where we'll dive into taking action on the taxonomy worksheet and working our database over with it.
Blog Author ID: