Submit Your Articles Free: Signup
Remember Me
forgot your password?

How has Google adopted LSI

  • Related Articles
  • Related Q&A

"A normal way of representing this is the grid or the matrix form; this is the
reason why experts call the LSI method as ‘thinking inside the grid'..."
Semantic text processing essentially understands linguists. Think of a statement; say, I am optimizing a paragraph for search engine. At least three to four words (I, am, a, for) in the statement are excesses, in the sense that they don't contribute actively towards the meaning of the sentence. They simply add value to the sentence grammatically. In this way, natural language contains numerous redundant and unnecessary words, from the point of view of search engines or semantic meanings. Functional words, conjunctions, prepositions, auxiliary verbs, and several other forms of words just add meaning to a sentence but do not add much content. Ironically, these are the most frequently used words in English.
In the very the first step in LSI, these words are picked up and ignored. The document is then left with words that may have some semantic meaning. We can discard:
Articles, prepositions, and conjunctions
Common verbs and pronouns
Common adjectives (big, late, high)
Frilly words (therefore, thus, however, albeit, etc.)
Any word that appear uniquely in every document or in a particular document
Inside the Grid Now, our document has a much-reduced collection of words on which we can apply our statistical methodology. We can now start to index this collection of words in the document. A normal way of representing this is the grid or the matrix form; this is the reason why experts call the LSI method as ‘thinking inside the grid'. The grid or matrix contains the documents listed along the horizontal axis and the words contained in the documents along the vertical axis.
For the conventional keyword search, we just put a cross (X) in the column for any
12 www.sem.mosaic-service.com document where a particular word (listed on the row) appears or just leave the column blank if the word does not appear. The grid then shows like this:
Document name/       Elevation        Topography     Height        Tiger
Keywords contained

GIS mapping               X                     X                 X
Topology                    X                    X                  X
Rainfall harvesting        X                    X
Poetries of William Blake                                                          X

Obviously, a grid may contain a cross or a blank. There is no midway and this way we can have an analysis of our document on keyword search. Note that we have left out any word or may have included it under any other column head if the form of the word varies, say it is ‘topologies' that appear somewhere in the document and not ‘topology'. If instead of looking for the presence of each keyword in a document we take into account how many times a word appears in the given document, the grid may appear something like this:
Document name/
Keywords contained
Elevation Topography Height Tiger
GIS mapping 5 8 6 1
Topology 6 6 3 0
Rainfall harvesting 2 3 7 0
Poetries of William Blake 0 0 0 5
These figures give certain mathematical meaning. We can calculate the mean, median, and mode of the occurrence of certain words in the document and the correlation between them. This gives us a detailed analysis on our document collection. In case of LSI, we do exactly this. After removing unnecessary words from the documents, we generate the term-document matrix. A graphical representation of this matrix would give you the termspace and will have as many dimension as the number of content-wise meaningful words.
This is because, to graphically represent the matrix, you will need as many axes to the 13 www.sem.mosaic-service.com graph as there are content words.
Going by this application of the theory, if we try to analyse a real-life document collection and note down the occurrence of each content word, we will get numerous relevant content words. If these are recorded in the matrix, as above, and plotted on a graph, the result in the term space will also have numerous dimensions. This is true for each
document in our collection. Each document is considered as a vector with the content words as their component. The documents with several common words will have vectors that are near to each other and hence, will be concluded to be semantically close.
Documents with fewer common words will have vectors that are far apart and hence, are semantically distant.
It is mathematically possible to describe this space, although it is difficult to visualize such a space. However, if you try to visualize this multi-dimensional space, you can gain another interesting insight into LSI. Try looking at a branch of a tree full of green leaves.
Since, there are leaves propping out at every possible direction, you will always fail to see all the leaves. That is, from whichever angle you try to look at the branch, few leaves will be hidden behind few others so that you can never see all the leaves at one go.
This idea can be contemplated as ‘loss in information' and is a similar idea that you can use to visualize your n-dimensional term space. From whichever angle you look from, some vectors in your n-dimensional term space always overlaps others and the boundaries blur or collapse. In other words, similar keywords or content words loses their distinct identity and get squeezed together. Hence, the difference between singular and
plurals, or synonyms or similar meaning words tend to attain a null value.
This idea can be contemplated as ‘loss in information' and is a similar idea that you can use to visualize your n-dimensional term space. From whichever angle you look from, some vectors in your n-dimensional term space always overlaps others and the boundaries blur or collapse. In other words, similar keywords or content words loses their distinct identity and get squeezed together. Hence, the difference between singular and plurals, or synonyms or similar meaning words tend to attain a null value. One thing to note here is that, although loss of information is deemed as a bad idea, it is 14 www.sem.mosaic-service.com
converted into a blessing when it comes to LSI. This technique of using or exploiting the feature of natural language, namely, similar-meaning words occur together, cuts off noise or unnecessary information. In the final lap, we can remove the hash from the hay.
Everyday, Google is taking a step to convert its whole search mechanism into an LSIenabled one. Although, LSI is not adapted uniformly and in entirety, and not all searches will return a semantic word set now, the transition is visible in the search results. Conducting a search for 'phone' will show results in which the keyword 'phone' is contained and highlighted. However, if you add the tilde (~) before your keyword and search, ('~phone') your result will show the Web site for Nokia and the word ‘Nokia' is now highlighted. From its new method of indexing, Google has determined that Nokia is relevant to phone.

Vikas Malhotra

Vikas Malhotra is a successful Internet marketer utilizing both pay-per-click marketing and search engine optimization to increase website traffic. To learn more, visit http://sem.mosaic-service.com

Rate this Article: 0 / 5 stars - 0 vote(s)
Print Email Re-Publish

Article Source: http://www.articlesbase.com/seo-articles/how-has-google-adopted-lsi-22587.html
Add new Comment



Captcha
  • Latest SEO Articles
  • More from Vikas Malhotra

SEO Obsession? Rank Checking

By: Leila Davies | 03/12/2008
"Sure you want to see your site’s progress, but there are things you need to keep in mind when you check where your site is ranked 30 times a day. The results you see may not be as accurate as you think."

How to Choose a Good Search Engine Optimization Company

By: Mark Cijo Jn | 03/12/2008
Whether you are starting an online business or just want to exploit the power of the Internet to advertise your existing company, getting your name out into the public can be a very difficult task. One of the best ways to introduce your company to potential consumers is through search engines like Google, Yahoo, and MSN.com as they are used by millions of people everyday, looking for information, advice, and potentially the products and services your company can supply.

Three Basic Rules of SEO

By: Mr Hanna | 03/12/2008
Some helpful advice in making your website SEO friendly

The Power of Link Building for Increasing Your Site Traffic

By: Salem Hassan | 03/12/2008
Search engine optimization has become sticky as algorithms governing the search engines have become more complex. It has become quite challenging to keep your website on top in the rankings, but the good news is that there are more tricks and tools to help you do so all the time.

Does Your Website Have No or the Wrong Sitelinks in Google?

By: Sortins Technologies | 03/12/2008
What are Sitelinks? Sitelinks are additional links below the description and the URL of a web page in Google's search results:

Backlinks: What They are and How to Get Them

By: Angela Edwards | 03/12/2008
Sometimes called inbound links, backlinks are the lifeblood of Search Engine Positioning. In order for a website to be on the top of search engines like Google without the webmaster having to pay big money for advertising, the website has to have a large number of backlinks. The more established, high quality, and high Page Rank the website that contains the link has, the more power it has to help the linked website.

10 Top Tips About SEO

By: Hitesh | 02/12/2008
Search engine optimization (SEO) can be an expensive exercise in terms of both time and money if you do not have a definitive plan of what to do and what not to do.

Building Search Engine Placement

By: Sandi Baker | 02/12/2008
There is a proper way to build search engine placement to achieve higher rankings. This article shows you the right way to go about it to rise above.

Latent Semantic Indexing

By: Vikas Malhotra | 14/04/2006 | SEO
Latent Semantic Indexing or LSI has changed the world of search engine optimization. One fine morning, SEO experts found that most of their best ranking sites on Google were in jeopardy. Google has simply updated its crawler-program to accommodate LSI.

Search Engine Marketing

By: Vikas Malhotra | 14/04/2006 | SEO
Estimates suggest that approx 7.3 million pages are being added daily, to the repository called web (Cyveillance Study) & that this rate of addition of information, is continuously accelerating.

Search Engine Basics 2

By: Vikas Malhotra | 14/04/2006 | SEO
Search Engines Methodology Having identified the major components of the search engines let us now look at their Methodology of producing results. This look involves two steps.

Search Engine Basics 1

By: Vikas Malhotra | 14/04/2006 | SEO
A search engine is one of the most important tool that helps you find information on the web. Search engines are giant sized automated cataloguing & retrieval systems.

Google paid advertising review part 2

By: Vikas Malhotra | 14/04/2006 | SEO
For running an effective Adwords campaign there are two components or areas that one has to focus on. Target Keyword selection Optimizing Bidding strategies Target Keyword selection

Google paid advertisig review part 1

By: Vikas Malhotra | 14/04/2006 | SEO
With the recent 'Florida' update on google & its impact on the SERPs, typically on the commercial search phrases, it has become imperative that we now examine paid advertising scenario at google.

Reciprocal Link Building Services India

By: Vikas Malhotra | 14/04/2006 | SEO
Link building is one of the SEM (Search Engine Marketing) tools to optimize a website for search engine spiders/ crawlers. A spider/ crawler is a software which is sent by a search engine to scan the information on a website, on coming across a link to th

One Way Linking campaign 2

By: Vikas Malhotra | 14/04/2006 | SEO
there is a way to generate links with the content that you have not as yet created. For this contact the established authorities (writers, publishers ) in your domain area & let them know that you are available as a resource for researching & writing on a

Article Categories




Use of this web site constitutes acceptance of the Terms Of Use and Privacy Policy | User published content is licensed under a Creative Commons License.
Copyright © 2005-2008 Free Articles by ArticlesBase.com, All rights reserved. (0.32, 12)