dth by Joshua Wood

Google: The Duplicate Content Myth

Greg Grothaus of Google's Search quality team posted a video on the Google Webmaster Central Blog dispelling the duplicate content penalty myth. The video is a reproduction of a talk he gave at the Search Engine Strategies conference in San Jose last month on Duplicate Content and Multiple Site Issues.

In the video, Greg explains that Google does not automatically penalize you for having duplicate content on your web site as many have believed. Many have accepted this myth because of Google's search feature that hides similar pages from the user. However, it has never been Google's intention to penalize well-meaning webmasters who might happen to have multiple copies of the same page by accident. Anyone who has been developing web sites for a while (specifically dynamic ones) will tell you that it's quite common to have several different variations of the same URL. Greg crystalizes this with the following example:

These URLs are all different:

  • example.com/
  • example.com/?
  • example.com/index.html
  • example.com/Home.aspx
  • www.example.com/
  • www.example.com/?
  • www.example.com/index.html
  • www.example.com/Home.aspx

The URLs are all slightly different, but they are all displaying the home page of example.com, which is obviously not duplicate content. Google, in its infinite wisdom, understands this and will even attempt to pick the best url and combine all of the extras into one listing in search results.

However, just because there is no penalty for duplicate content does not give us an excuse to become lazy about keeping our URLs and URL re-writing techniques as clean as possible. You are still at a major disadvantage if people are linking to different copies of the same page, in that the link juice that could be captured by one single url on your web site is now being dispersed among two or more. Greg rightly states that if you have two identical pages with slightly different links, and 10 people are linking to one and 10 to the other, your listing is going to have half the rank from incoming links that it should. This is called dilution of link popularity. In addition to problems with linking, multiple URLs could also result in user-unfriendly URLs in search results, as well as inefficient crawling by search engines: you want them digging for new content, not re-reading the same thing.

Joomla 1.0 to 1.5 Migration Issues

I have a good number of my clients running Joomla on their web sites, many of them large organization web sites with hundreds of articles and some of the most data-intensive extensions. I had been following the development progress of Joomla 1.5 for several months after its release, at which point I was still chained to the 1.0.x series for most existing sites because compatible extensions hadn't been released. Now that most of the extensions that matter are at the very least compatible in legacy mode, I've been slowly working over the summer to get my clients migrated from the 1.0.x series to 1.5. It's not the easiest process, in fact I think it's the most complicated, rage-inducing software upgrade I've ever had to deal with... However it has gotten easier as I've encountered some common issues that can be (sometimes easily) avoided.

Character Encoding

By far, the most irksome (and most common) issue with Joomla 1.5 migration is encoding. Joomla 1.0.x uses latin character encoding (iso-8859-1), while 1.5 uses the more standardized UTF-8. I'm definitely not saying it's a bad thing to convert your data to UTF-8, but I will warn that it can cause a severe headache if you go at it by trial and error. Unfortunately, in my experience (so far) it will come down to some trial and error. There are, however, several steps you can take that if performed correctly, will save you from throwing your computer out the window.

I will start by saying that so far, I have not successfully migrated data from 1.0 to 1.5 without first converting the 1.0 database to UTF-8 manually. I will go through the steps to accomplish this shortly. I just want to clarify because this seems to be the recurring theme in my migration-issues. The Joomla 1.5 installer does have the capacity to convert your migration files from iso-8859-1 (or other character set) to UTF-8 on the fly using PHP's iconv function, but every time I've attempted this I have ended up with major problems with character encoding on my new 1.5 installation (this mostly results with special characters and punctuation missing or replaced in the migrated content). I've also attempted to convert my migration files to UTF-8 manually with the iconv command in Unix, with similar issues. Sometimes it has worked better than others, but it seems there is always some sort of problem with it. Basically, if you can get your Joomla 1.0.x install working properly with UTF-8 encoding, migrating to Joomla 1.5 will be a much simpler endeavor.

Converting Joomla 1.0.x to UTF-8

For the purpose of migrating, we should only have to get the database converted to UTF-8, and ensure that when we generate a migration file from the Joomla administrator, that file will be UTF-8 as well. The following steps should accomplish this. I highly recommend first creating a copy of your production Joomla installation and database to work with.

1. Create a dump file (I'll call this "joomla-latin.sql") of your existing Joomla database using phpMyAdmin or the command-line. Remember, the command for dumping a database is:

mysqldump --opt -u username -p database_name > joomla-latin.sql

2. Create an empty MySQL database (we'll call it "joomla10-utf8"). Make sure the character set is UTF-8, and use a UTF-8 coalition such as utf8_general_ci

3. Before we can import any data, we need to take the dump file we created in step one and convert it to UTF-8 as well. For this step we'll use sed from the command-line to convert the file:

sed -r 's/latin1/utf8/g' joomla-latin.sql > joomla-utf8.sql

4. Now that we've got our UTF-8 dump file, we can finally import it into the new UTF-8 database that we created in step 2:

mysql -u username -p joomla10-utf8 < joomla-utf8.sql

5. We should now have a UTF-8 database which is identical to our production Joomla database, save for the encoding. To test it, open up your Joomla configuration.php and replace the old database name with the new database.

6. If you view your Joomla web site at this point, you should at least see that your data is there, however there may be some encoding problems because Joomla still thinks it's using iso-8859-1. To fix this, open up your language file and set the _ISO definition to "charset=utf-8".

7. You may also need to modify your includes/database.php file to complete the conversion. Open the file and go to line 102, and find the following line and uncomment it:

$this->_table_prefix = $table_prefix;
//@mysql_query("SET NAMES 'utf8'", $this->_resource); <-- Uncomment
$this->_ticker = 0;
$this->_log = array();

If you follow these steps exactly, you should be up and running with a workable UTF-8 version of your Joomla 1.0.x web site. Verify that your data is displaying correctly, and then you can create your migration file using the 1.0 migrator component and proceed to your Joomla 1.5 installation. When you load your migration file during the 1.5 installation, you will need to select UTF-8 as the encoding since your migration file has already been converted. If you fail to select the UTF-8 character set, the migration will fail with an iconv error (it will be trying to convert the file to UTF-8 when that is what it already is).

For more information regarding UTF-8 encoding and Joomla, check out David Gal's Joomla UTF-8 Guide.

Errors during migration

If you encounter errors during the migration process (while installing Joomla 1.5), you will have to reset your Joomla installation and start over from scratch. Unless I'm completely inept, it's not uncommon to encounter some errors during the migration (it's easy to miss something). A few things to consider when resetting your installation:

  1. Empty your Joomla 1.5 installation database, as the installer probably created some tables
  2. Delete the migration file that you uploaded via the installer or copied manually, located here: installation/sql/migration/migrate.sql
  3. If the installation completed but something went wrong, you will also have to delete or empty your configuration.php file.

One other thing is that when you complete a Joomla 1.5 installation using your 1.0 migration script, you will have to remove the installation directory before you can view the new web site to verify that your data is intact. Instead of deleting the directory, I'd recommend renaming it until you are certain you won't have to start over with your installation/migration, otherwise you will have to replace it from a fresh Joomla distribution in the event that you need to try again. In fact, it will be a good idea to keep your installation directory until you're ready to go live with your migrated Joomla 1.5 web site. One time I got to the point of updating my 1.0 template to be 1.5 compatible and realized the data wasn't quite right, and had to re-install with a fixed migration script. Luckily all I had to do was rename the directory back to "installation", and follow the steps I mentioned above before re-installing and then going back to work on my template updates.

These are a few of the major issues that I (and many others) have encountered while attempting to migrate Joomla 1.0 to 1.5. Luckily, the headache is definitely worth it, Joomla 1.5 has solved a lot of major complaints I had with 1.0, and is a much more mature platform all around. I've developed several 1.5 native extensions so far and am much happier with the framework than I was with 1.0.x. There is definitely more I could write on the subject of migration, I may add to this article as I have the time and as I encounter any new problems since I am still in the process of migrating a few sites myself.

Additional Resources

Goodbye Expression Engine, Hello WordPress.

I recently got the itch to renew a never-ending project that will keep me up nights and threaten to steal my focus from money-making endeavors; therefore there is a somewhat updated look here at DTH as well as a completely different publishing platform... Don't get me wrong, I love Expression Engine. I just couldn't get over the numerous blogging features and addons that WordPress has over EE. For certain projects I would definitely pick EE over WordPress just as I would even choose Joomla over both for many applications. Anyway, recently I've developed a few web sites professionally using WP and I think that as far as blogging software is concerned, it is pretty hard to beat.

Seriously though, I have been working on DTH for two reasons: I wanted to fine-tune my recently perfected WordPress expertise, and I've been wanting to start writing again, creating personally, etc... I will however admit that a big part of the reason I want to write a blog is so I can rank it in search engines and convert some of you readers to paying customers... There, I said it.

The layout / design is still a work in progress... It has not been fully tested for browser compatibility yet. If you happen to find a bug, or even just want to criticize my poor taste, feel free to contact me. (or leave a comment)