Google PageRank Algorithm: |
| |
| If you own, administer, or have designed a website intended for commercial purposes, such as an online store, or a site which relies on advertisement and page impressions for revenue, you've likely discovered that you can use search engines as a very powerful tool for free advertisement. |
| |
| In a world where there are no physical buildings which, by virtue of their very existence, advertise your presence, search engines serve as the equivalent of advertisement billboards. Since most Web surfers today have learned to rely upon search engines as the preferred method of finding information (upwards of 80% of all Internet users use search engines as their primary source of information gathering), having your site come up as one of the top ten search engine results is akin to having the biggest, brightest billboard. |
| |
| Now, you're probably already aware that good keyword research and proper keyword implementation on your site will generally yield a better return on investment (ROI) than massive advertisement campaigns. After all, if your page is listed atop all the major search engines, then people looking for the products or services you offer will likely find you at the same time as, or even before, they find your competitors. You're also probably already aware that getting to the top of search engines results, especially for popular keywords, is not something that "just happens." Pages are rated upon their level of importance and relevancy to the searcher's request via a number of factors, and when searchers go to their favorite search engine and type up what they're looking for, results returned are sorted according to what the search engine believes to be the most relevant results, in descending order. |
| |
| Enter Google. Currently the largest, and most talked about search engine on the Net, Google boasts the number one position in the search engine market, with over 60% of all searches being processed in one way or another through the site's servers. Their market position was attained by their relentless search for the best ways to judge content on the web, ensuring that web searchers can find what they're looking for without a lot of hassle. And although their system judges pages by a number of factors, one of the most important, especially to those attempting to use Google as a means to free marketing, is the concept of the Google PageRank. |
| |
| PageRank (PR) is the actual ranking of a page, as determined by Google. A page's rank can go from 0.15 well into the billions. I'll explain how these numbers come about a bit later in the article, and will go into further detail in the second part of this series. For now, just know that PageRank is one of the ways Google ranks a page's importance, and that PageRank is based on the number and PageRank of other sites linking to it. |
| |
| The way Google sees it, if page A gets linked to by page B (also known as a backlink from page B to page A), that means that page B is voting for page A, and in fact, as we'll see later, page B is actually giving a bit of its own PageRank to page A. In other words, PageRank says nothing about the content or size of the page. In fact, it doesn't even care about what text got linked to the page, just that it got linked. |
| |
| To quote Google's definition of PageRank: |
| |
| We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d ping factor is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about D in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: |
| |
| PR(A) = (1 - d) + d(PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) |
| |
| Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one. |
| |
| PagrRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web |
| |
| (NOTE: The statement "the sum of all web pages" is actually referring to the normalized sum, or the average, of all web pages' PageRanks.) |
| |
| If you didn't understand that, don't worry. Here's a breakdown of the equation, and what it really means to you: |
| |
| 1. PR(A) - This is the PageRank of Page A. |
| |
| 2. PR(Tn) - This is the PageRank of Page Tn. |
| |
| 3. C(Tn) - This is the Count (number) of outgoing links from page Tn. For example, the number of links going out of page 1 is represented by C(T1), for page 2 it's C(T2), and so forth until Tn. Remember that each page spreads its vote out evenly among all of its outgoing links, so the more links a page has going out, the less each link is worth to the page to which it's linked. |
| |
| 4. PR(Tn)/C(Tn) - This is the PageRank of Page Tn divided by the number of links going out of that page. If Page A has a been linked to by page Tn, then the share of the vote page A will get is the PageRank of Tn divided by the Count of outgoing links from page Tn. For example, if the PageRank of T1 was 5, and it had 10 outgoing links, then PR(T1)/C(T1) would be 5/10, or 0.5. So Page A would be getting 0.5 PageRank from Page T1's link. |
| |
| 5. d(...) - Since all the fractions of all the votes are added together, the result has to be dampened down to stop the other pages (T1, T2,...,Tn) from having too much influence. Given the previous example, presuming only three pages are linking to Page A, and presuming all pages linking to Page A have a PR of 5, as well as 10 outgoing links, then the equation d(PR(T1)/C(T1)+....) would be the following, when used with actual numbers: |
| |
| .85((5/10)+(5/10)+(5/10)) = .85 (1.5) = 1.275 |
| |
| In short, 1.275 is how much total PageRank Page A is now gaining from all of its incoming links. |
| |
| 6. (1-d) - This part of the equation is a bit of probability math, so that "the sum of all web pages' PageRanks will be one." (*) Think about it this way: (1 - d) is (1 - .85), which equals to .15. If in our example only two pages were linking to Page A (all other things being equal), then d(PR(T1)/C(T1) + PR(T2)+C(T2)) would be .85 (the dampener) multiplied by 1 (the total PR being attained, not 1.5, as in the previous example), which means the equation would be as follows: |
| |
| PR(A) = (1 - .85) + .85(1) = .15 + .85 = 1, PR(A) = 1 |
| |
| Looking at it this way, it's pretty simple, but I'm sure you can see that the more incoming links you have, the more complicated the equation gets. Also, notice that even if Page A did not have any backlinks, the page would still hold a small PageRank (.15). |
| |
| Although a page's PageRank can range from a value of less than one to a value in the billions, most people will refer to PR as a number between 0 and 10. That's because they're referring to the Toolbar PageRank, which is the value displayed on the green graph in Google's toolbar (http://toolbar.google.com). Here's an example of how the two sets of numbers correlate: |
| |
| Real PageRank |
Toolbar PageRank |
| 0 - 10 |
0 |
| 10 - 100 |
1 |
| 100 - 1,000 |
2 |
| 1,000 - 10,000 |
3 |
| 10,000 - 100,000 |
4 |
|
| |
| NOTE: Please remember that this is an example. We can't know the full details of the scale because the Real PageRank's upper limit is based on how many pages are on the Internet at the time of Google's indexing, which happens about once a month. (So yes, it is possible, but not too likely, for your page to increase or drop in rank after an indexing even if nothing other than the number of pages on the Internet has changed.) It is sufficient to realize, at least for our example, that the Toolbar PageRank is based on the exponentially increasing Real PageRank. (This is why it's harder for a page to go for PR 7 to PR 8 than for it to go from PR 1 to PR 4.) |
| |
| This is why brand new pages don't generally have any rank; they're not yet in the index. But what happens when a brand new page is created within your site AND it has a rank? Simply put, the Toolbar is guessing. Here's an example: |
| |
| A.com has a PR of 5. You decide to create 1.php and link it only from A.com. As soon as you create A.com/1.php, 1.php gets an automatic PR of 4. Is this because the page is that important? Not quite. What happens is that the toolbar looks at the URL of a page and strips off everything down to the last "/" after the domain registrar (.com, .org, .etc...). After learning the PR of that page, the toolbar will then assign the value of the page doing the linking (in this case, A.com, the parent page) to the page being linked to (in this case, 1.php, the child page), and simply subtract "1" from that number. Should that child page link to a grandchild page (say, 2.php), then that grandchild page will have a PR of 3 (the PR of 1.php minus 1). Following this logic, here's what you get with sequentially linked pages. |
| |
| A.com (links to)-> |
1.php (links to)-> |
2.php |
| PR 5 |
PR 4 |
PR 3 |
|
| |
| ...and so forth, until the PR has been exhausted. |
| |
| (Please remember that this is an extremely simplified example. Real page linking strategies are a bit more complicated and will be covered in full during the second article in this series.) |
| |
| Of course, pages which have just been created and are root domains (ex. http://www.example.com) will have a PageRank of 0 until they've been indexed by Google's spiders. |
| |
| This ends our discussion on what the Google PageRank is. In the next installment of this series, we'll get into specific examples of how different linking strategies can help optimally increase the ROI on your links in order to achieve the best possible PageRank. |