The Google PageRank formula fails!!!

SpaceshipSpaceship
Watch

leusmith

Established Member
Impact
1
Hi friends

Here is a blog that features an important article about how Google PageRank Formula fails. It took me a lot of research to author this article. Check it out and post your comments there or even on NamePros here.

The link to the blog is:http://zorh.blogspot.com
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
Unstoppable Domains — AI StorefrontUnstoppable Domains — AI Storefront
The Google PageRank system: Is it really in use?

Sections in this article:

1.0-Introduction

2.0-Applying the PR system for the first time

· 2.1-During indexing

· 2.2-Afer reindexing

3.0-Applying thrPR system for reindexing

· 3.1-During reindexing

· 3.2-After reindexing

4.0-Conclusion

1.0-Introduction:
Google’s patented PageRank system of ranking a web page(or any other document that is available on the Internet) is ofcourse the world’s best algorithm for a search engine. It is this PageRank that has made Google different from other search engines. Since its introduction, PageRank has got almost all responsible webmasters on fire, looking for a link from a highly ranked web page. This system has also resulted in the development of

A separate stream of websites like Link Directories and Reciprocal Link Directories. When reviewed theoretically this system seems to be almost correct, but, when it comes to practical application it fails. The extent of failure of the PageRank system depends on how and when Google applies it to deliver search results.

Below is an extract from the article ‘The Anatomy of a Lareg-Scale Hyper Textual Web Search Engine’ by Larry Page and Sergey Brin.

“------------------Extract starts---------------“
2.1 PageRank: Bringing Order to the Web

The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page's "PageRank", an objective measure of its citation importance that corresponds well with people's subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results (demo available at google.stanford.edu). For the type of full text searches in the main Google system, PageRank also helps a great deal.
2.1.1 Description of PageRank Calculation

Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.

PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.

2.2 Anchor Text

The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases. This makes it possible to return web pages which have not actually been crawled. Note that pages that have not been crawled can cause problems, since they are never checked for validity before being returned to the user. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it. However, it is possible to sort the results, so that this particular problem rarely happens.

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.
“------------------Extract ends---------------“

The formula in the first section of the extract is referred to as the PageRank Formula or the PR Formula further in this article.

A few possibilites of how and when Google calculates the PageRank of a webpage (and how the system might fail) are discussed below.

To discuss in detail we might the following information as a base:

Let us consider a 2 database tables with the following fields. The database model may be of any kind that is with more or less fields for more options. But here is just a simple one to enable you to understand this article.

Table name: page_index

page_id

page_url

page_title

page_description

page_keywords

page_content

indexed_date

outgoing_links

incoming_links

page_rank

Table name: links

link_id

anchor_text

link_from_url

link_to_url

Here the table ‘page_index’ is the table that is searched when a user uses the search engine. The table links is used to maintain the records links containing in the page indexed. The fields page_id and link_id are auto incremented each time when a record is inserted(so that they act a unique serial number).

2.0-Applying the PR system for the first time

2.1-Applying the PageRank system for the first time if the PageRank is calculated during the indexing process:

To start let us consider two webpages A and B. We consider that the web has only two pages, so as to avoid complexity in understanding the problem. However this problem can be understood in the complex form also. The same results are obtained in every case- whether complex or simple. With each having a link to each other. When the indexing process is started with page A. The indexing program may need to index page B, in order to calculate the PageRank of page A, according to the formula mentioned in the extract from the article in section 1.0

Since he 2 pages mentioned above have reciprocal links, the indexing program might start off with an endless redirection thus calculating the PageRank of 0 pages even after a dime duration of 10 hours!

Suppose if the page B contains a link to page C, which in turn contains a link to page D and so on…, then the indexing progarm might reach the last page and then stop indexing if the last page contains no link. If the last page contains a link to any one of the previously crawled(but not indexed) pages then the indexing program might again get stuck in a process of endless redirection as mentioned above.

2.2-Calculating the PageRank after indexing all the pages for the first time:

Here again the same problem—endless redirection(as described in section 2.1) arises, if the PR is calculated after indexing all the pages once.

So inorder to get the PR system working, we need to manually set the PR of all or atleast a few(some big number that is small to a computer) in each category related(or most linked) sites. When Google was in its development stages there were about 10 million web pages on the Internet(just dream about accurately setting the PR of about 10000 web pages of different categories—Impossible job, so Google might have not done it).

The practical possibility is that the initial PR of each page on the web be set equal to another and then start the indexing process and calculate the PR for each page.

If the above method is applied, then there are equal probabilities that the PR of a web page(may or may not be a broken link), containing just links(about 2000 of them) may get a higher PR while an informative hompage(like that of Yahoo!’s) might get a low PR(may be 1 or even 0!).


3.0-Applying the PR system for reindexing

3.1-Calculating the PR during reindexing:

For a while let us assume that we have calculated the PRs of all the web pages. Now let us consider reindexing the pages and calculate their new PR value.

Let us start with a web page A. If we find a link to another webpage B then we index it (inorder to calculate the PR according to the PR Formula). If we find a link on page B to another page C, which in turn links to another page D and goes on… then we reach the nth page just to calculate the PR of page A. If page n is a broken link then the indexing program might stop working or again as mentioned in section2.1, might go on a cyclic ride.

In order to get this working again we might have to consider the old PR values each page that page A links to or gets links incoming links from, which is as good as never updating th PR.


3.2-Calculating the PR after reindexing:

Now assuming that we have indexed all the webpages on our computer. Again we might have to consider the old PR values of each webpage in order to calculate the new PR values. Thus ending with the same result as in section 3.1

4.0-Conclusion

Thus when the PageRank system is put into practical use we would end up with no PR value for a page or wrong (therefore useless) PR values. The same result is obtained whether the indexing program is run on one computer or on many computers simultaneously.

But we also have a contradiction: Google uses PR to deliver search results.

These results are relating to the keyword searched for.

So the question now is 'Is Google really using PR to deliver search results?'
Source: http://zorh.blogspot.com/
 
0
•••
You've put a lot of work into it! Great article!
 
0
•••
0
•••
reply

Nick
That is what Google claims......So I do not think we should consider that....
 
0
•••
leusmith said:
Nick
That is what Google claims......So I do not think we should consider that....
I know. But atleast it is better then what other people claims :)
 
0
•••
Friend......

No friend
The owner will always claim that this is so and this is that........But it should be proved. Anyone can make claims right?

Even you can work on an algorithm and claim it ot be better than PageRank but it has to be proved......
 
0
•••
So is the above one proved...

Can you please show us some proved ones. :)
 
0
•••
Reply

Nick are you joking? :)

Can you please explain what you mean by 'proved' ones?
I have proved the formula going wrong by all cases......
 
0
•••
You provided a speculative theory that does not contain any tangible evidence.
Just like Nick said your Pagerank article is just like one of many that consists of speculation.

There are to many factors involved to make a informative decision on the workings of Googles Pagerank calculation so i have to bluntly say you did not prove anything to make the claim "The Google PageRank formula fails!!!"

To claim The Google PageRank formula is failing you need to fully understand the complete Pagerank algorithm which is a continues changing and adapting process.

Do you understand the complete calculation process of Google Pagerank ;)
 
0
•••
leusmith said:
Nick are you joking? :)

Try searching google itself for pagerank theories. All writers claim that they are proved ones.

Now I am a n00b and if I want to believe any theory which one should I believe in. I am really confused on this one. That's why I was asking that can you show me some REAL proved ones.

Plus same question like damion:

Do you understand the complete calculation process of Google Pagerank If yes then please teach me :)
 
0
•••
There is one overall bad assumption here in that as you state in your last line:
So the question now is 'Is Google really using PR to deliver search results?'

The bad assumption is made because you are using the PR to search results as an absolute and it is not - as a matter of fact Google has stated on many occasions that PR is just a small part of their "ranking" algorithm and that there are over 100 other factors that affect how a page actually ranks in the Search Engine Result Pages (SERPs) that you see doing a standard search.

This can be proved very simply by taking any search - and looking at the page rank(if you believe you can actually find a tool that shows accurate page rank - I for one do not believe there is one that exists) of the top 10 results - I will bet you that they will not be ranked in order of their page rank :)
 
0
•••
i heard google changes their formula every now and then, which could play a part too. but the general factors that play a part in a website's pagerank are pretty much the same across most articles.
 
0
•••
Damion said:
You provided a speculative theory that does not contain any tangible evidence.
Just like Nick said your Pagerank article is just like one of many that consists of speculation.

There are to many factors involved to make a informative decision on the workings of Googles Pagerank calculation so i have to bluntly say you did not prove anything to make the claim "The Google PageRank formula fails!!!"

To claim The Google PageRank formula is failing you need to fully understand the complete Pagerank algorithm which is a continues changing and adapting process.

Do you understand the complete calculation process of Google Pagerank ;)


Yes I read through and understood the whole pagerank paper.....
That is why I have proved it mathematically

To Nick and everyone else:

On the my article on the blog you will find a link to the PageRank paper just go through section 2.12. Also a detailed paper on PageRank is available on the stanford university website. stanford.edu.

The paper available on the stanford website is only on pagerank. Th pdf format has about 22 pages. It uses the matrix method and is a variant of eigen vector centrality measure. Also if any of you need brief info on pagerank other algorithms then visit wikipedia and type pagerank in the search form. At the bottom of the article you will fing externall links to some site called some shetwecher.de (or something like that)

The article there explains 4 algorithms including the pagerank algorithm in brief.

Also please note that in the original page rank paper it is mentioned that, before calculating the PageRank after indexing the pagerank of each page is set equally.
In reference to this read section 3.2 in my article. That is what Google does when they do when they calculate(again as mentioned in the original PR paper in the stanford university site)

For the original PageRank paper visit stanford university site and click the link to publications and then search for the term pagerank. Click on the link to the latest article on PageRank there(2003, the old one is 2002)

---------------------------
This discussion is getting interesting. Everyone just be there. May be we might be able to explore something interesting :)

Also thanks to everyone for participating in this discussion. Let us continue peacefully.
---------------------------
 
0
•••
leusmith said:
Yes I read through and understood the whole pagerank paper.....
That is why I have proved it mathematically

There you go, you're under the impression that what you have read is everything what makes the behavior and results of Pagerank.
The pagerank paper you have read is not the entire pagerank algo so you have not proved anything.

Also SERP are not fully calculated on pagerank alone and there are more factors involved, factors that we don't know about or know for certain.

How is the pagerank algo failing? Do you think it fails by not showing accurate search results for keyword/phrase queries?
 
1
•••
-Nick- said:
leusmith said:
stanford.edu.
Humm.... so we have a stanford website on one hand and on the other hand we have US patent office. Moreover here is another big one http://pr.efactory.de

leusmith said:
For the original PageRank paper visit stanford university site and click the link to publications and then search for the term pagerank. Click on the link to the latest article on PageRank there(2003, the old one is 2002)
leusmith said:
Let us continue peacefully.
Yeah Its getting interesting. :)
 
Last edited:
0
•••
Google has something called adwords and adsense. PR ranks tell Google that their site is a good resource. and the higher the PR the higher the adds payout. does page rank help in search results I don't think it's a major factor in SERPS.
 
0
•••
john's TLD Domain said:
Google has something called adwords and adsense. PR ranks tell Google that their site is a good resource. and the higher the PR the higher the adds payout. does page rank help in search results I don't think it's a major factor in SERPS.
LOL that is why I told you that please don't go on making statements, bumping this and that. Just give out your opinions. Thanks god no one else noticed this and started a debate on it.

Now Who says this
john's TLD Domain said:
and the higher the PR the higher the adds payout.

If ads payout was based on PR then I would have been a billionaire by now ;)

Sorry I used Billionaire, because I am already reaching near millionaire with my penny paying ads and some coins here and there.
 
0
•••
Dynadot — .com TransferDynadot — .com Transfer
Appraise.net

We're social

Escrow.com
Spaceship
Rexus Domain
CryptoExchange.com
Domain Recover
CatchDoms
DomainEasy — Zero Commission
DomDB
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back