Advertisement
  1. SEJ
  2.  ⋅ 
  3. SEO

Google Explains How CDNs Impact Crawling & SEO

Google's new crawl documentation explains how to fix crawling issues related to CDNs

Google Explains How CDNs Impact Crawling & SEO

Google published an explainer that discusses how Content Delivery Networks (CDNs) influence search crawling and improve SEO but also how they can sometimes cause problems.

What Is A CDN?

A Content Delivery Network (CDN) is a service that caches a web page and displays it from a data center that’s closest to the browser requesting that web page. Caching a web page means that the CDN creates a copy of a web page and stores it. This speeds up web page delivery because now it’s served from a server that’s closer to the site visitor, requiring less “hops” across the Internet from the origin server to the destination (the site visitor’s browser).

CDNs Unlock More Crawling

One of the benefits of using a CDN is that Google automatically increases the crawl rate when it detects that web pages are being served from a CDN. This makes using a CDN attractive to SEOs and publishers who are concerned about increasing the amount of pages that are crawled by Googlebot.

Normally Googlebot will reduce the amount of crawling from a server if it detects that it’s reaching a certain threshold that’s causing the server to slow down. Googlebot slows the amount of crawling, which is called throttling. That threshold for “throttling” is higher when a CDN is detected, resulting in more pages crawled.

Something to understand about serving pages from a CDN is that the first time pages are served they must be served directly from your server. Google uses an example of a site with over a million web pages:

“However, on the first access of a URL the CDN’s cache is “cold”, meaning that since no one has requested that URL yet, its contents weren’t cached by the CDN yet, so your origin server will still need serve that URL at least once to “warm up” the CDN’s cache. This is very similar to how HTTP caching works, too.

In short, even if your webshop is backed by a CDN, your server will need to serve those 1,000,007 URLs at least once. Only after that initial serve can your CDN help you with its caches. That’s a significant burden on your “crawl budget” and the crawl rate will likely be high for a few days; keep that in mind if you’re planning to launch many URLs at once.”

When Using CDNs Backfire For Crawling

Google advises that there are times when a CDN may put Googlebot on a blacklist and subsequently block crawling. This effect is described as two kinds of blocks:

1. Hard blocks

2. Soft blocks

Hard blocks happen when a CDN responds that there’s a server error. A bad server error response can be a 500 (internal server error) which signals a major problem is happening with the server. Another bad server error response is the 502 (bad gateway). Both of these server error responses will trigger Googlebot to slow down the crawl rate. Indexed URLs are saved internally at Google but continued 500/502 responses can cause Google to eventually drop the URLs from the search index.

The preferred response is a 503 (service unavailable), which indicates a temporary error.

Another hard block to watch out for are what Google calls “random errors” which is when a server sends a 200 response code, which means that the response was good (even though it’s serving an error page with that 200 response). Google will interpret those error pages as duplicates and drop them from the search index. This is a big problem because it can take time to recover from this kind of error.

A soft block can happen if the CDN shows one of those “Are you human?” pop-ups (bot interstitials) to Googlebot. Bot interstitials should send a 503 server response so that Google knows that this is a temporary issue.

Google’s new documentation explains:

“…when the interstitial shows up, that’s all they see, not your awesome site. In case of these bot-verification interstitials, we strongly recommend sending a clear signal in the form of a 503 HTTP status code to automated clients like crawlers that the content is temporarily unavailable. This will ensure that the content is not removed from Google’s index automatically.”

See also: 9 Tips To Optimize Crawl Budget For SEO

Debug Issues With URL Inspection Tool And WAF Controls

Google recommends using the URL Inspection Tool in the Search Console to see how the CDN is serving your web pages. If the CDN firewall, called a Web Application Firewall (WAF), is blocking Googlebot by IP address you should be able to check for the blocked IP addresses and compare them to Google’s official list of IPs to see if one of them are on the list.

Google offers the following CDN-level debugging advice:

“If you need your site to show up in search engines, we strongly recommend checking whether the crawlers you care about can access your site. Remember that the IPs may end up on a blocklist automatically, without you knowing, so checking in on the blocklists every now and then is a good idea for your site’s success in search and beyond. If the blocklist is very long (not unlike this blog post), try to look for just the first few segments of the IP ranges, for example, instead of looking for 192.168.0.101 you can just look for 192.168.”

Read Google’s documentation for more information:

Crawling December: CDNs and crawling

Featured Image by Shutterstock/JHVEPhoto

Category News SEO
ADVERTISEMENT
SEJ STAFF Roger Montti Owner - Martinibuster.com at Martinibuster.com

I have 25 years hands-on experience in SEO, evolving along with the search engines by keeping up with the latest ...

universo-virtual.com
buytrendz.net
thisforall.net
benchpressgains.com
qthzb.com
mindhunter9.com
dwjqp1.com
secure-signup.net
ahaayy.com
soxtry.com
tressesindia.com
puresybian.com
krpano-chs.com
cre8workshop.com
hdkino.org
peixun021.com
qz786.com
utahperformingartscenter.org
maw-pr.com
zaaksen.com
ypxsptbfd7.com
worldqrmconference.com
shangyuwh.com
eejssdfsdfdfjsd.com
playminecraftfreeonline.com
trekvietnamtour.com
your-business-articles.com
essaywritingservice10.com
hindusamaaj.com
joggingvideo.com
wandercoups.com
onlinenewsofindia.com
worldgraphic-team.com
bnsrz.com
wormblaster.net
tongchengchuyange0004.com
internetknowing.com
breachurch.com
peachesnginburlesque.com
dataarchitectoo.com
clientfunnelformula.com
30pps.com
cherylroll.com
ks2252.com
webmanicura.com
osostore.com
softsmob.com
sofietsshotel.com
facetorch.com
nylawyerreview.com
apapromotions.com
shareparelli.com
goeaglepointe.com
thegreenmanpubphuket.com
karotorossian.com
publicsensor.com
taiwandefence.com
epcsur.com
odskc.com
inzziln.info
leaiiln.info
cq-oa.com
dqtianshun.com
southstills.com
tvtv98.com
thewellington-hotel.com
bccaipiao.com
colectoresindustrialesgs.com
shenanddcg.com
capriartfilmfestival.com
replicabreitlingsale.com
thaiamarinnewtoncorner.com
gkmcww.com
mbnkbj.com
andrewbrennandesign.com
cod54.com
luobinzhang.com
bartoysdirect.com
taquerialoscompadresdc.com
aaoodln.info
amcckln.info
drvrnln.info
dwabmln.info
fcsjoln.info
hlonxln.info
kcmeiln.info
kplrrln.info
fatcatoons.com
91guoys.com
signupforfreehosting.com
faithfirst.net
zjyc28.com
tongchengjinyeyouyue0004.com
nhuan6.com
oldgardensflowers.com
lightupthefloor.com
bahamamamas-stjohns.com
ly2818.com
905onthebay.com
fonemenu.com
notanothermovie.com
ukrainehighclassescort.com
meincmagazine.com
av-5858.com
yallerdawg.com
donkeythemovie.com
corporatehospitalitygroup.com
boboyy88.com
miteinander-lernen.com
dannayconsulting.com
officialtomsshoesoutletstore.com
forsale-amoxil-amoxicillin.net
generictadalafil-canada.net
guitarlessonseastlondon.com
lesliesrestaurants.com
mattyno9.com
nri-homeloans.com
rtgvisas-qatar.com
salbutamolventolinonline.net
sportsinjuries.info
topsedu.xyz
xmxm7.com
x332.xyz
sportstrainingblog.com
autopartspares.com
readguy.net
soniasegreto.com
bobbygdavis.com
wedsna.com
rgkntk.com
bkkmarketplace.com
zxqcwx.com
breakupprogram.com
boxcardc.com
unblockyoutubeindonesia.com
fabulousbookmark.com
beat-the.com
guatemala-sailfishing-vacations-charters.com
magie-marketing.com
kingstonliteracy.com
guitaraffinity.com
eurelookinggoodapparel.com
howtolosecheekfat.net
marioncma.org
oliviadavismusic.com
shantelcampbellrealestate.com
shopleborn13.com
topindiafree.com
v-visitors.net
qazwsxedcokmijn.com
parabis.net
terriesandelin.com
luxuryhomme.com
studyexpanse.com
ronoom.com
djjky.com
053hh.com
originbluei.com
baucishotel.com
33kkn.com
intrinsiqresearch.com
mariaescort-kiev.com
mymaguk.com
sponsored4u.com
crimsonclass.com
bataillenavale.com
searchtile.com
ze-stribrnych-struh.com
zenithalhype.com
modalpkv.com
bouisset-lafforgue.com
useupload.com
37r.net
autoankauf-muenster.com
bantinbongda.net
bilgius.com
brabustermagazine.com
indigrow.org
miicrosofts.net
mysmiletravel.com
selinasims.com
spellcubesapp.com
usa-faction.com
snn01.com
hope-kelley.com
bancodeprofissionais.com
zjccp99.com
liturgycreator.com
weedsmj.com
majorelenco.com
colcollect.com
androidnews-jp.com
hypoallergenicdogsnames.com
dailyupdatez.com
foodphotographyreviews.com
cricutcom-setup.com
chprowebdesign.com
katyrealty-kanepa.com
tasramar.com
bilgipinari.org
four-am.com
indiarepublicday.com
inquick-enbooks.com
iracmpi.com
kakaschoenen.com
lsm99flash.com
nana1255.com
ngen-niagara.com
technwzs.com
virtualonlinecasino1345.com
wallpapertop.net
nova-click.com
abeautifulcrazylife.com
diggmobile.com
denochemexicana.com
eventhalfkg.com
medcon-taiwan.com
life-himawari.com
myriamshomes.com
nightmarevue.com
allstarsru.com
bestofthebuckeyestate.com
bestofthefirststate.com
bestwireless7.com
declarationintermittent.com
findhereall.com
jingyou888.com
lsm99deal.com
lsm99galaxy.com
moozatech.com
nuagh.com
patliyo.com
philomenamagikz.net
rckouba.net
saturnunipessoallda.com
tallahasseefrolics.com
thematurehardcore.net
totalenvironment-inthatquietearth.com
velislavakaymakanova.com
vermontenergetic.com
sizam-design.com
kakakpintar.com
begorgeouslady.com
1800birks4u.com
2wheelstogo.com
6strip4you.com
bigdata-world.net
emailandco.net
gacapal.com
jharpost.com
krishnaastro.com
lsm99credit.com
mascalzonicampani.com
sitemapxml.org
thecityslums.net
topagh.com
flairnetwebdesign.com
bangkaeair.com
beneventocoupon.com
noternet.org
oqtive.com
smilebrightrx.com
decollage-etiquette.com
1millionbestdownloads.com
7658.info
bidbass.com
devlopworldtech.com
digitalmarketingrajkot.com
fluginfo.net
naqlafshk.com
passion-decouverte.com
playsirius.com
spacceleratorintl.com
stikyballs.com
top10way.com
yokidsyogurt.com
zszyhl.com
16firthcrescent.com
abogadolaboralistamd.com
apk2wap.com
aromacremeria.com
banparacard.com
bosmanraws.com
businessproviderblog.com
caltonosa.com
calvaryrevivalchurch.org
chastenedsoulwithabrokenheart.com
cheminotsgardcevennes.com
cooksspot.com
cqxzpt.com
deesywig.com
deltacartoonmaps.com
despixelsetdeshommes.com
duocoracaobrasileiro.com
fareshopbd.com
goodpainspills.com
kobisitecdn.com
makaigoods.com
mgs1454.com
piccadillyresidences.com
radiolaondafresca.com
rubendorf.com
searchengineimprov.com
sellmyhrvahome.com
shugahouseessentials.com
sonihullquad.com
subtractkilos.com
valeriekelmansky.com
vipasdigitalmarketing.com
voolivrerj.com
zeelonggroup.com
1015southrockhill.com
10x10b.com
111-online-casinos.com
191cb.com
3665arpentunitd.com
aitesonics.com
bag-shokunin.com
brightotech.com
communication-digitale-services.com
covoakland.org
dariaprimapack.com
freefortniteaccountss.com
gatebizglobal.com
global1entertainmentnews.com
greatytene.com
hiroshiwakita.com
iktodaypk.com
jahatsakong.com
meadowbrookgolfgroup.com
newsbharati.net
platinumstudiosdesign.com
slotxogamesplay.com
strikestaruk.com
trucosdefortnite.com
ufabetrune.com
weddedtowhitmore.com
12940brycecanyonunitb.com
1311dietrichoaks.com
2monarchtraceunit303.com
601legendhill.com
850elaine.com
adieusolasomade.com
andora-ke.com
bestslotxogames.com
cannagomcallen.com
endlesslyhot.com
iestpjva.com
ouqprint.com
pwmaplefest.com
qtylmr.com
rb88betting.com
buscadogues.com
1007macfm.com
born-wild.com
growthinvests.com
promocode-casino.com
proyectogalgoargentina.com
wbthompson-art.com
whitemountainwheels.com
7thavehvl.com
developmethis.com
funkydogbowties.com
travelodgegrandjunction.com
gao-town.com
globalmarketsuite.com
blogshippo.com
hdbka.com
proboards67.com
outletonline-michaelkors.com
kalkis-research.com
thuthuatit.net
buckcash.com
hollistercanada.com
docterror.com
asadart.com
vmayke.org
erwincomputers.com
dirimart.org
okkii.com
loteriasdecehegin.com
mountanalog.com
healingtaobritain.com
ttxmonitor.com
bamthemes.com
nwordpress.com
11bolabonanza.com
avgo.top