We are all used to talking about SEO and thinking of keywords, blog posts and thematic linking across different parts of one website, but when it comes to a large scale online business, like Microsoft, the challenge becomes a lot more difficult and the solution which is applied can become a valuable lesson for those who may not necessarily have a business that large to optimize but who could benefit from the practices applied.

Derrick Wheeler, Senior SEO Architect at Microsoft, recently talked about the challenge presented by Microsoft’s many web products and the popularity of its sites and explained how he handled the challenge presented by size and content which is hard to categorize. At a time when there is a lot of user-generated content going live on many websites his experiences become particularly useful to webmasters looking to successfully optimize their own sites.

"It's a large complicated website where the content is generated by multiple business units in many different countries in many different languages, and you're trying to get things done within a complex, large organization, where there's just a lot of dependencies - a lot of stakeholders - a lot of different interests," explains Wheeler.

"A lot of people talk about 'content is king. Content is king,' says Wheeler. "[But] with 'mega SEO,' structure is king because without structure, your content won't even be discovered.”

"Some of the situations with our site, Microsoft.com...we'll have one million pages of navigation to get to fourteen thousand pages of content, and the way that you get to that content determines the URL of the final landing page, so every final landing page of content will have however many different ways there are of getting to it duplicated, so you know, you've got like twenty million URLs just for fourteen thousand pages," he says. "So a lot of mega SEO is about crawl efficiency - making your site more efficient for crawling and indexing."

"One of the things that we deal with are the crawler efficiencies - things like large scale duplicate content or just junk content - outdated content - content that's been up for like five years, but the person that managed it left the company and no one took over so there's just content sitting out there that engines have to index," he continues. "We don't want that stuff surfacing. We want our new stuff, so getting rid of legacy content, trying to fix things at the platform level, so you don't continue to make the same mistakes over and over and over or just build on the issues that you have with your existing content management system."

"But one of our challenges is we have multiple content management systems," he adds. "We've got one primary for one section, another section of the site might have two or three that they use. I mean it's basically all over the board."

"I can't just go in and fix the CMS and have everything magically fixed. We have to go in and prioritize what CMS we want to try to work with," he adds.

Wheeler’s experience shows the need for keeping track of content relevancy. It’s not enough, for instance, for your site to have thousands of pages of content if that content is not fresh and relevant to those who visit it. Search engines check not just visitor frequency but also visitor behaviour upon landing on a page. Get a large number of visitors bouncing off a page because its content is out of date and its relevancy rating in the search engine index begins to drop which means the page then works against you. Worse than that, it works against any page it’s linked to or which links to it, helping drag them down as well.

How did Wheeler deal with it? He used the help of the Microsoft IT department (MSIT).

"MSIT was involved with that - our IT department," says Wheeler. "They can tell when the content that hadn't been updated in a certain amount of time and then they reached out to who were listed as the owners of that section and they contacted them and asked them if they still needed that content, and if there was no response in a certain amount of time, they would just remove it. And if they did respond then they would work out whether or not this content was still valid, and if it wasn't then they all agreed that it would be removed."


"It was a lot of email chains that I was on," he adds. "Hundreds of emails back and forth to get all this accomplished, and I think they removed probably a million, two million URLs from the site just by that one exercise."

"A lot of these pages of content weren't getting any traffic," Wheeler notes. "That was another way that we could tell that they were not really useful....We didn't go in and manually map them to any other section of the site."

Having a site the size of Microsoft’s of course has some advantages the average webmaster cannot enjoy, such as a relative level of immunity from content that is out of date or pages which return a 404.

"I don't think an engine is going to dock us for having pages of content that were really old and not updated and removing them from our website, and the proper response for a page that no longer exists is the 404," says Wheeler. "I don't think that they would penalize us for that. I'm pretty sure of it."

"We could've gone in probably and found some that were valuable and redirected them somewhere, but in general, our site has a lot of authority just because when we launch something, we get a ton of links," he says. "You know, people - bloggers are always talking about Microsoft and all the stuff that we're doing. Our site in general has a lot of authority, so it wasn't a big priority for us at the time."

If you achieve that kind of authority with your website the chances are that you also begin to become immune to the effects of a 404 or the odd page which dates back to 1999 and has the kind of content which today only makes online visitors laugh.

When you're talking about a site the size of Microsoft.com, there are other things besides irrelevant content however that are likely to come into play. "That's just one aspect of mega SEO," says Wheeler. "The other would be the international piece - it's huge for us, because we have close to a hundred different countries and many different languages, and there's 23 countries that we really focus a lot on, but our content - the way we publish it basically...for Australia, their content can be in a lot of different places scattered all over our website, and it's hard for them to manage their SEO when their content's spread all over the place."

"So one of the things we've tried to do is come up with a standard international URL policy, because without that, it's hard for a country to even manage their own content," he says. "Even that's been a battle because some of the content management systems that we publish on can't conform to that structure so it's just a constant....with mega SEO it's about making small strides over time that [when] grouped together they have a really big impact."

Who's in charge of the whole Microsoft website? It's Just Ballmer.

"There's so many different business groups and our website Microsoft.com doesn't roll up to a single person until it gets to Steve Ballmer," says Wheeler. "As soon as you break off of Steve Ballmer, you've got someone else that's responsible for MSDN TechNet. There's another business group that's responsible for the support site...so we don't have a centralized authority that manages the entire Microsoft.com domain. So it's very difficult because some businesses will make decisions on what's in their best interest, and it might not really be what's in the best interest of our site as a single domain name."

"The first thing I did was really try to draw an image (because I'm very visual) of what are all the pieces involved in order to optimize the site," says Wheeler of his approach. "And for us...there's four levels of where the SEO occurs on the site, and to support those four levels, there's a lot of what we'll call workstreams or initiatives or focus areas that support those four levels."

SEO by Level

"The first level is the site-wide SEO," explains Wheeler. "That's the crawl efficiency stuff we talked about. The next level is subsidiary level SEO, which is the international piece and working with them."

"The next is what we call site-specific so there might be an individual site on Microsoft.com - they want to do SEO...well we have three levels and they can do it themselves and we provide guidance, they can do a little bit with an agency (just have the agency do the keyword research, do some training...), or they can do a full service agency program," he continues. "And then there's the people who say, 'I want to optimize this page for this keyword'. Well, we'll give them some generic advice like, 'you should use that word on your page and you should actually think [about] more than just that page and on board to one of our site-specific programs.'"

So what lessons can you learn for your online business?

If you have a site with large (and ever increasing content) start with the structure. Employ a unified approach to URLs and the CMS used to manage everything. Have a clearly defined content strategy in terms of keywords and cross-linking. Finally make sure that there is someone in charge so that questions which arise have a central figure to go back to.

Apply all this to your site as it grows. Start a site with this kind of growth to mind. Use it to organize a site which is now growing fast with much user-generated content and what you will end up with is the kind of overall SEO structure, strategy and approach which produces a massive web presence.

Learn More About David