Hey—we've moved. Visit The Keyword for all the latest news and stories from Google

Official Blog

Insights from Googlers into our products, technology, and the Google culture

Unicode over 60 percent of the web

February 3, 2012
*Your mileage may vary: these figures may vary somewhat from what other search engines find. The graph lumps together encodings by script. We detect the encoding for each webpage; the ASCII pages just contain ASCII characters, for example. Thanks again to Erik van der Poel for collecting the data.

As you can see, Unicode has experienced an 800 percent increase in “market share” since 2006. Note that we separate out ASCII (~16 percent) since it is a subset of most other encodings. When you include ASCII, nearly 80 percent of web documents are in Unicode (UTF-8). The more documents that are in Unicode, the less likely you will see mangled characters (what Japanese call mojibake) when you’re surfing the web.

We’ve long used Unicode as the internal format for all the text Google searches and process: any other encoding is first converted to Unicode. Version 6.1 just released with over 110,000 characters; soon we’ll be updating to that version and to Unicode’s locale data from CLDR 21 (both via ICU). The continued rise in use of Unicode makes it even easier to do the processing for the many languages that we cover. Without it, our unified index it would be nearly impossible—it’d be a bit like not being able to convert between the hundreds of currencies in the world; commerce would be, well, difficult. Thanks to Unicode, Google is able to help people find information in almost any language.

Posted by Mark Davis, International Software Architect
Share on Google+ Share on Twitter Share on Facebook
Google
  

Labels


  • Africa 19
  • Android 58
  • April 1 4
  • Asia 39
  • Europe 46
  • Latin America 18
  • accessibility 41
  • acquisition 26
  • ads 131
  • apps 418
  • books + book search 48
  • commerce 12
  • computing history 7
  • crisis response 33
  • culture 12
  • developers 120
  • diversity 35
  • doodles 68
  • education and research 144
  • entrepreneurs at Google 14
  • faster web 16
  • free expression 61
  • google.org 73
  • googleplus 50
  • googlers and culture 202
  • green 102
  • maps and earth 194
  • mobile 124
  • online safety 19
  • open source 19
  • photos 39
  • policy and issues 139
  • politics 71
  • privacy 66
  • recruiting and hiring 32
  • scholarships 31
  • search 505
  • search quality 24
  • search trends 118
  • security 36
  • small business 31
  • user experience and usability 41
  • youtube and video 140


Archive


  •     2016
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2015
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2014
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2013
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2012
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2011
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2010
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2009
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2008
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2007
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2006
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2005
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr
    • Mar
    • Feb
    • Jan
  •     2004
    • Dec
    • Nov
    • Oct
    • Sep
    • Aug
    • Jul
    • Jun
    • May
    • Apr

Feed

Googleon Google+
Follow
Instagram
Give us feedback in our
Product Forums.

Company-wide

  • Public Policy Blog
  • Research Blog
  • Student Blog

Products

  • Official Android Blog
  • Chrome Blog
  • Lat Long Blog

Developers

  • Developers Blog
  • Ads Developer Blog
  • Android Developers Blog
  • Google
  • Privacy
  • Terms