Sunday, June 24, 2018

Web Page Text indexed by Search Engines

Most Search Engines scan and index all of the text on a webpage. However, some Search Engines ignore certain text known as Stop Words, which is explained below. Apart from this, almost all Search Engines ignore spam.

Stop Words

Stop words are common words that are ignored by search engines at the time of searching a key phrase. This is done in order to save space on their server, and also to accelerate the search process.
When a search is conducted in a search engine, it will exclude the stop words from the search query and will use the query by replacing all the stop words with a marker. A marker is a symbol that is substituted with the stop words. The intention is to save space. This way, the search engines are able to save more web pages in that extra space, as well as retain the relevancy of the search query.
Besides, omitting a few words also speeds up the search process. For instance, if a query consists of three words. The Search Engine would generally make three runs for each of the words and display the listings. However, if one of the words is such that omitting it does not make a difference to search results, it can be excluded from the query and consequently, the search process becomes faster.
Some commonly excluded "stop words" are: after also an and as at be
because before between
but
before
for
however
from
if
in
into
of
or
other
out
since
such
than
that
the
these
there
this
those
to
under
upon
when
where
whether
which
with
within
without

No comments:

Post a Comment

Web Page Text indexed by Search Engines

Most Search Engines scan and index all of the text on a webpage. However, some Search Engines ignore certain text known as Stop Words, whic...