You are here:   Details View
Register   |  Login
The Reason Bing Cannot Keep Up With Google? - @bing
By xfernal on Oct 24, 2009 12:03 AM • Rank (1585) • Views 1562
1

1
The Reason Bing Cannot Keep Up With Google?  - @bing

I have been using Bing for a while now as my default search provider. I do like the relevancy of the results. However, I consistently have to do Google searches to find newer and more results. Now in my searches, they are undoubtedly mainly work related. These searches would be for .Net error messages, Namespace references, example code, server configurations, SQL scripts, etc. Bing is great for returning MSDN and developer related data, and the results are always very relevant on the first page, but there simply are not as many results and relevancy gets poor very quickly after the first page of results. What I have noticed is that Google has more and newer results that are relevant, while Bing still has not crawled or indexed the data I have been looking for.

Now I have thought about this a lot, and I believe the answer is really quite simple…Bing does not crawl sites as much or as often as Google. Now before I show proof of my rant, I do want to say I LOVE BING! However, I think the web logs speak for themselves…
I have run reports on IIS web logs for years and Google has always been the #1 spider for any site I have ever reviewed site logs for. As a web developer, I can assure you it is a very large sample of different sized sites, from very large to very small, and this is not some off the cuff claim that cannot be supported. While most of these sites have been Windows based servers, even those other OS flavor sites have had very similar results.
The following statistics will be for 3 different sites, all running the DotNetNuke web platform. These sites are running on Windows servers from 3 different hosting providers. Each report will be for the last 30 days and include the top 10 spiders.  I will refrain from using the domain name of these sites, for several reasons. Mainly, because I am not trying to promote anyone with this post, but am confident that 99% of all sites on the Internet would report similar results. I have also run these same reports against all web logs from all dates, and the results are always worse for MSN Robot, so there is some promise that it is getting better. I would be very surprised to hear someone say otherwise, so please correct me if I am wrong.
Site #1:
This first example is a small site with a domain that has been around about 5 years and the content is updated about daily for a very niche subject.
Summary
 
Hits
Total Hits
340,121
Visitor Hits
260,092
Spider Hits
80,029
Average Hits per Day
11,337
Average Hits per Visitor
24.80
Cached Requests
28,832
Page Views
Total Page Views
67,545
Average Page Views per Day
2,251
Average Page Views per Visitor
6.44
Visitors
Total Visitors
10,486
Average Visitors per Day
349
Total Unique IPs
5,291
 
Daily Spider Activity
Top Spiders
 
Top Spiders
 
Spider
Hits
1
Googlebot
24,103
2
Yahoo! Slurp
22,107
3
MSN Robot
18,436
4
DotBot
2,800
5
Google Feedfetcher
2,167
6
Exabot
1,710
7
KaloogaBot
1,661
8
InternetSeer
1,416
9
Sosospider
1,191
10
Cuil Robot
762
 
There have been a total of 80,029 total spider hits in the last 30 days (see Summary). Of that, Google accounts for 30.1% of the total. MSN Robot, aka Bing, has 23.0% of those spider hits. That is a difference of 5,667 hits in the last 30 days between Googlebot and MSN Robot. While that may not seem like a big difference, one has to consider that the site is small and does not have as many pages as larger sites.
Doing a query against Bing for “site:domain” returns 1,330 results. The same query against Google returns 2,280 results, a difference of 950 pages that have been indexed.
Site #2
This site is new and medium in size. The domain is a little over 6 months old. It is a very active social networking and knowledge base site and is growing extremely fast. There are many new rich content pages created every day, managed by a large community of moderators and users, posting relevant information about a niche subject.
Summary
 
Hits
Total Hits
9,527,548
Visitor Hits
8,960,533
Spider Hits
567,015
Average Hits per Day
317,584
Average Hits per Visitor
56.52
Cached Requests
1,387,293
Page Views
Total Page Views
2,100,505
Average Page Views per Day
70,016
Average Page Views per Visitor
13.25
Visitors
Total Visitors
158,550
Average Visitors per Day
5,285
Total Unique IPs
27,959
 
Daily Spider Activity
 
Top Spiders
 
 
Top Spiders
 
Spider
Hits
1
Googlebot
211,662
2
MSN Robot
165,779
3
Yahoo! Slurp
92,098
4
Google Feedfetcher
35,995
5
Cuil Robot
9,564
6
MJ12bot
9,338
7
Sosospider
8,920
8
Baidu Spider
8,191
9
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); http://spinn3r.com/robot) Gecko/20021130
8,060
10
R6_FeedFetcher(www.radian6.com/crawler)
3,402
 
There have been a total of 567,015 total spider hits in the last 30 days. Of that, Google accounts for 37.3% of the total. MSN Robot has 29.2% of those spider hits. That is a difference of 45,883 hits in the last 30 days.
Doing a query against Bing for “site:domain” returns 2,330 results. The same query against Google returns 8,430 results.  That is a difference of over 3 times the number of pages that have been indexed.
Not what if I told you that this was a Microsoft related website run by a bunch of MVPs? As a developer of crawlers myself, one would think that Microsoft has given partial treatment to “Microsoft” related sites within the brains of their crawling machine. I certainly have added such in our Seamus crawler and would be disappointed if MSN Robot did not have such intellect.
 
Site #3:
The third site is a very large site, but with less traffic. The domain is a little over 1 year old and gets many new pages of content every day. This site has been heavily using social media in the last 30 day, and overall traffic is growing quickly. This site is social network covering a broad range of subjects.
Summary
 
Hits
Total Hits
3,140,963
Visitor Hits
2,956,206
Spider Hits
184,757
Average Hits per Day
104,698
Average Hits per Visitor
60.20
Cached Requests
392,956
Page Views
Total Page Views
119,622
Average Page Views per Day
3,987
Average Page Views per Visitor
2.44
Visitors
Total Visitors
49,108
Average Visitors per Day
1,636
Total Unique IPs
29,789
 
Daily Spider Activity
 
Top Spiders
 
Top Spiders
 
Spider
Hits
1
Googlebot
139,644
2
Yahoo! Slurp
19,713
3
Google AdSense Robot
6,156
4
Cuil Robot
5,203
5
MSN Robot
4,041
6
Baidu Spider
2,001
7
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Spinn3r (Spinn3r 3.1); http://spinn3r.com/robot) Gecko/20021130
1,721
8
Sosospider
1,412
9
AskJeeves Robot
901
10
DotBot
878
 
How about the Googlebot trend in Daily Spider Activity! I would say Google likes this site alot and the social media has definitely had a postive effect. Will MSN Robot start answering the call? I keep on hoping, but refuse to hold my breath.
Reviewing the all time spider access to the site since its inception, shows MSN Robot in 7th place, even lower, so it is a promising trend to see that it has moved up 2 places in the last year. However, let’s look at the numbers here for the last 30 days…
There have been a total of 184,757 total spider hits from just the top 10 spiders of the site. Of that, Google accounts for 75.8% of the total. MSN Robot has a mere 2.1% of those spider hits!
Doing a query against Bing for “site:domain” returns 3,450 results. The same query against Google returns 23,800 results, a difference of 20,350 index pages. This is a staggering difference!
Okay, enough with the statistics, so what’s my point?
Consistently, Google out crawls MSN Robot, hands down, and has done so for years. While some may argue that most of what is being crawled by Google is crap/spam, the fact is that even for valid, content rich sites, MSN Robot cannot keep up with the Googlebot crawlers. What this means is great content is not getting indexed and never returned in the search results from Bing. This also leads to my disappointment when having to also do a Google search when looking for that hard to find solution to my programming issue. I really want to just use Bing!
Now I am not bashing Microsoft or Bing. I love what Bing is and the potential for the future. I desperately hope that Bing can seriously compete with Google.  I develop .Net code all day and most of the night and as a developer of spiders and crawlers for the last 8 years, I know a lot about how these aggregating/crawling/spidering applications work. I personally have written about 10 variations of spiders myself, including Seamus.  Now I would never bite the hand that feeds me, but come on! If Bing wants to be a serious competitor in the search engine market, I think it is time for an overhaul on the crawling process.
My hopes are that Bing already has a plan for this. Since I have no inside connections, I cannot speculate, but maybe with the new Yahoo! business relationship there are plans to also use Yahoo! Slurp for crawling? Looking at the above statistics, you can see that Yahoo! Slurp in 2 out of the 3 reports shows it out crawling MSN Robot. I believe that is fairly consistent with most web reports as well. Maybe this is already in place. Who knows?
What is the solution?
Now of course, I do not have access to MSN Robot source, nor know of how the queue system was architected, and I would love to see it! But, assuming it was architected to be distributed, and I would be sorely disappointed if not,  I think there simply are not enough MSN Robots running. I will not quote any numbers, but I once read in Wired magazine several years ago that most of Google servers are small, but there are a LOT of them. Now I have no idea how many are dedicated to crawling sites, but from past site logs, there are certainly more unique IP addresses coming from Googlebot than there are for MSN Robot. Recently however, I have seen more unique IP addresses coming from MSNBot than Googlebot. This makes the situation more confusing.
So, is it the way MSN Robot is architected? Maybe.
Or, is it that Microsoft has foolishly neglected throwing major sums of cash at the acquisition of a global army of crawling servers running MSN Robot? Most likely. But that is not apparent from the logs.
I am sure there are factors that I cannot even comprehend and without full knowledge of the infrastructure, I am for the most part in the dark, but if Bing has any chance of search engine dominance, I surely hope they have a future plan for overhauling their current crawling and aggregating process. Without it, the majority of folks will still be querying Google to find what they are looking for.
Map Location

GPS Latitude: 0.00000 Longitude: 0.00000
Visitors
Top User:
Total Check-Ins: 0
Unique Visitors: 0
My Check-Ins: 0
  
Comments (4) - Comment RSS
xfernal wrote: on Oct 24, 2009 02:21 AM
To drive my point even further, I see Google indexed this page over 2 hours ago: http://www.google.com/search?hl=en&rls=com.microsoft%3Aen-us&q=Bing+Cannot+Keep+Up+With+Google&aq=f&oq=&aqi=
xfernal wrote: on Oct 24, 2009 02:28 AM
I just looked at the logs, and Google crawled this page within 5 minutes of it being posted. MSNBot still has not shown up!
xfernal wrote: on Oct 24, 2009 02:23 PM
I just looked at the logs and MSNbot crawled this page 12.5 hours after Googlebot. But, here's the kicker. Googlebot had crawled the site twice more during that 12 hour period.
James Wallace wrote: on Oct 25, 2009 01:12 AM
wow.
Random Media - View All Media (1)
   
Socialize
Who Viewed
Who Reviewed
Shortened URL
http%3a%2f%2ftinyurl.com%2fyzwd66c%00

What is I'm Vivo! ?

Vivo is the Italian word for alive. Therefore, "I’m Vivo!" means "I’m Alive!" As a user and as a community, you decide if a piece of information will live or die. I’m Vivo! is changing how people discover information and bring it to life! » Learn more
» Submit an audio/podcast, business, feed, image, news, or video!