Home | Issues | About the Dialogue

What Is Latent Semantic Indexing (lsi)

It would be easier to state what Latent Semantic Indexing is not, than try to explain what it is. Even a statistical mathematician would find it extremely difficult to explain the concept of LSI, sometimes referred to as latent semantic analysis, to a layman in just a few words!

LSI is not what most SEO experts claim it to be. It is certainly not a concept that can be used by the average web designer or webmaster to improve their search engine listings, and is not what many people, including myself, has written it to be. However, first some background.

The term 'semantics' is applied to the science and study of meaning in language, and the meaning of characters, character strings and words. Not just the language and words themselves, but the true meaning being conveyed in the context in which they are being used.

In 2002 a company called Applied Semantics, an innovator in the use of semantics in text processing, launched a program known as AdSense, which was a form of contextual advertising whereby adverts were placed on website pages which contained text that was relevant to the subject of the adverts.

The matching up of text and adverts was carried out by software in the form of mathematical formulae known as algorithms. It was claimed that these formulae used semantics to analyze the meaning of the text within the web page. In fact, what it initially seemed to do was to match keywords within the page with keywords used in the adverts, though some further interpretation of meaning was evident in the way that some relevant adverts were correctly placed without containing the same keyword character string as used on the web page.

Google launched its own contextual advertising system in March 2003, and subsequently acquired Applied Semantics just over a month later. Adsense as we know it was launched and webmasters could make considerable sums of money by attracting visitors to web pages specifically designed for the purpose. Every click on an advert earned cash from Google for the owner of the website displaying it.

It became commonplace for websites to comprise hundreds, and even thousands, of software-generated pages containing repetitions of keywords and long-tailed key phrases, but little else. Thousands of pages could be generated, the only difference between them being the keyword or phrase used, with no content whatsoever for the visitor. Such software is still being sold on the internet in spite of all the attention given to the so-called LSI algorithm.

Google searched each webpage that was registered for the Adsense system and determined the theme of the page my means of semantic analysis. At this time there was no differentiation made in the analysis between sites using only the same keyword repeatedly and those with genuine content relevant to the theme. Adverts related to this theme were then added to the page by Google.

These pages were ranked highly due to their high keyword density, and there were so many generated that only a small proportion needed to become visible in the listings for their owners to make money from the adverts that Google placed on them. These sites could generate several thousands of dollars for their owners every single day without contributing any worth to the internet at all.

In order to control this spamming' of its search indices with worthless websites, Google decided to add what it termed LSI, or latent semantic indexing, to its indexing algorithm, very similar to what it was using to determine the theme of Adsense pages. What this claims to do is to analyze the semantic content of websites and determine the true value of the site to any visitor using a specific search term.

This value was analyzed by looking for semantically similar words and phrases as the keywords used, rather than only the keywords themselves. In this way, pages containing keywords with little other contextually similar content were rooted out and the pages either de-listed or demoted down Google's search index for these keywords.

LSI is now regarded as being a major means of optimizing webpages to conform to the requirements of the Google algorithms. Minimal use of keywords, and more use of synonyms and phrases relevant to the contextual meaning of the keyword relating to the page, became the way to use LSI to achieve higher listings. Or so the SEO experts informed us. In fact, the concept of latent semantic indexing has been known in statistical analysis for decades, and is not something that can be used' as such on a website.

There are many SEO websites suggesting that they can provide a service to make our website LSI friendly, or meet LSI requirements'. One way of doing this, it is suggested, is to stuff the page full of synonyms and other related terms. I have written articles myself about how this can be done, and tried to suggest the correct way to use LSI. Although my suggested use' of LSI was erroneous in scientific terms, the ideas introduced are nevertheless good practice and will help you to produce webpages containing genuine content.