Character Sets: We know we need an encoding tag for compliancy purposes, but which one?
If a site is doing is serving content, using UTF-8 is fine. By the way, ISO-8859-1 is actually a subset of UTF-8.
Where you can get into trouble is when you are *accepting* data from visitors from some foreign countries. The browser will re-encode any data the user enters into UTF-8 and send it back to the server. If your server can't handle UTF-8 encoded data, then you could run into some trouble. For example, if you are using a database backend and a user from China enters some Chinese characters into a form, then your database needs to know how to handle the UTF-8 encoded data it just received.
Any database will accept UTF-8 data because it is "byte oriented", but knowing what to do with the data, that is something completely different. The worst that could happen is that you lose some data from foreign customers and this is not good.
The search engines, and especially Google, will have an easier time indexing a page with the ISO-8859-1 statement, as Googlebot will not have to encode any data, which can be a lengthy process. What happens is that since the index is housed on US servers, having the UTF-8 statement forces an encoding process, even though no encoding data is present. Essentially, you are making Googlebot work harder than it should work.
Recommendation: Use the ISO-8859-1 tag on all US-based pages that will NOT serve foreign characters (i.e. forms). For pages that do server foreign characters, use the UTF-8 tag. It is suggested that on your form pages to use the UTF-8 tag and place all of your pages with forms in a separate folder. You will then exclude that folder from being indexed in the robots.txt file.
Why You Should Do This: Using the ISO-8859-1 tag on your content pages will allow for the spiders to index your site quickly and correctly while at the same time allow for the browser to display your code properly. Using the UTF-8 tag on your forms will allow for any foreign data to be passed to your database correctly.
- <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
- <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
If a site is doing is serving content, using UTF-8 is fine. By the way, ISO-8859-1 is actually a subset of UTF-8.
Where you can get into trouble is when you are *accepting* data from visitors from some foreign countries. The browser will re-encode any data the user enters into UTF-8 and send it back to the server. If your server can't handle UTF-8 encoded data, then you could run into some trouble. For example, if you are using a database backend and a user from China enters some Chinese characters into a form, then your database needs to know how to handle the UTF-8 encoded data it just received.
Any database will accept UTF-8 data because it is "byte oriented", but knowing what to do with the data, that is something completely different. The worst that could happen is that you lose some data from foreign customers and this is not good.
The search engines, and especially Google, will have an easier time indexing a page with the ISO-8859-1 statement, as Googlebot will not have to encode any data, which can be a lengthy process. What happens is that since the index is housed on US servers, having the UTF-8 statement forces an encoding process, even though no encoding data is present. Essentially, you are making Googlebot work harder than it should work.
Recommendation: Use the ISO-8859-1 tag on all US-based pages that will NOT serve foreign characters (i.e. forms). For pages that do server foreign characters, use the UTF-8 tag. It is suggested that on your form pages to use the UTF-8 tag and place all of your pages with forms in a separate folder. You will then exclude that folder from being indexed in the robots.txt file.
Why You Should Do This: Using the ISO-8859-1 tag on your content pages will allow for the spiders to index your site quickly and correctly while at the same time allow for the browser to display your code properly. Using the UTF-8 tag on your forms will allow for any foreign data to be passed to your database correctly.
Comments