You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2014/12/02 08:23:31 UTC

[Solr Wiki] Update of "PracticalSearch" by StanHu

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "PracticalSearch" page has been changed by StanHu:
https://wiki.apache.org/solr/PracticalSearch

New page:
= A Practical Search User’s Guide: How to Find and Be Found =
<<TableOfContents(3)>>
== Introduction ==
{{ http://www.doublespark.co.uk/tl_files/images/news/2013/google-mobile-search.jpg|Mobile Search|width=478}}
	In the modern day, applications of search become more and more prevalent due to the expanding nature of technology. High-speed internet in the United States is starting to become granted, and users look for more convenience in internet usage. According to [[http://dailyreckoning.com/the-six-most-fascinating-technology-statistics-today/ | an article on the Daily Reckoning]], mobile users contributed to less than 1% of internet traffic in 2009, but skyrocketed to 15% by 2013. The numbers are clear; people love convenience. However, this convenience is not only in terms of how people use the internet, like with mobile phones vs computers, but also with how they search for information. 
= All About SEO =
	Search all begins with [[http://en.wikipedia.org/wiki/Search_engine_optimization|SEO]]. Knowing what Search Engine Optimization is and how it works is invaluable to enhancing searches on the consumer side as well as figuring out how to design a website or pick keywords to increase the probability of users finding that site. Of course, there are users out there who wish to take advantage of how search engines work in order to bring more users to their site with no regard to the user’s experience. The dirty tactics these people use is referred to as ''Black Hat SEO'', which we will cover before looking at the good stuff.
== Knowing Black Hat SEO: Learning To Play Fair ==
{{ http://i.imgur.com/lXqChR4.png|Black Hat SEO|width=289}}
	Before we dive into how we can use SEO to our advantage, we must first be aware of the limits of SEO. Optimizing SEO is one thing, but exploiting SEO is another. Black Hat SEO, also known as spamdexing, is a term referring to the purposeful manipulation of search engine indexes for the sole purpose of increasing the relevance of a site. It sounds like an evil practice, which it is, but there are many people who are guilty of it and do not even know it.
=== Why Cheating Never Wins ===
{{http://i.imgur.com/KBOnzKE.png|Football Tags|width=425}}
	Cheating search engines is a common practice to try and popularize a site or a link, but in the end, it does not even work. One example is with Youtube videos and tags. Youtube offers a tagging system, allowing users to put many tags relevant to their video, in order to help it be found. Especially from a marketing perspective, it does not seem like a bad idea to put in other tags that would appeal to the viewer of the video even though it does not directly relate to the video. For instance, it’s not necessarily wrong thinking to assume people who like Football might be interested in Soccer and Nascar or vise versa, but putting these as tags in a video with Football as its primary focus, like this screenshot above of a low-ranking football video, is a perfect example of this Black Hat SEO. Not only is this inherently evil, but ironically enough, it does not even work. It is not just the amount of views or clicks a link gets that gives it that coveted #1 rank on the search listings page; search engines collect a lot of data regarding links and how “relevant” they are. An important statistic to take note of here is the [[http://en.wikipedia.org/wiki/Bounce_rate|Bounce Rate]]. The bounce rate of a site represents the percent of users who enter a site and leave without clicking other pages on the site. If a site has a high bounce rate – as in, if a large percentage of users click on the link and then immediately back out – the search engine will lower the relevance ranking of the site. This is particularly the reason why using Black Hat techniques, even if they succeed initially, is eventually worse off for the site. Using proper SEO tactics, as we will see later, is a much better method for gaining popularity in the long run.
=== Other Black Hat Techniques ===
	There are many other dirty tactics aside from this Youtube tag example we used (which is a great example of ''keyword stuffing''). Note the similarity of all these Black Hat tactics – they all focus on targeting the search engine, for the sole purpose of relevance ranking, rather than thinking about the actual user. Let us go over them first, so we can learn more about them and why we should avoid them. The tactics listed below are all of the main forms of content spamming; note that there are many other Black Hat techniques that are not listed in this article.
==== Article Spinning ====
	''Article Spinning'' is the practice of slightly altering relevant content to their website. Most of the time, article spinning is done automatically, by a script. The script replaces words with synonyms – for example, take a look at this example sentence and what it might look like after an article spinning script is run on it:
{{http://i.imgur.com/Yly08At.png|Article Spinning Example|width=455}}
 It is clearly visible how artificial this sounds – so why do people use it? It is used solely for posting content that search engines will look at while ranking pages, which is spam by definition. For instance, the “spun” content may make up half a page, while the other half is all full of advertisements. It is not only considered a crime against SEO, but it also can infringe copyright laws, such as the [[http://en.wikipedia.org/wiki/Digital_Millennium_Copyright_Act|Digital Millennium Copyright Act]] and the[[http://en.wikipedia.org/wiki/Online_Copyright_Infringement_Liability_Limitation_Act|Online Copyright Infringement Liability Limitation Act]] .
==== Cloaking ====
	''Cloaking'' is the shadowy practice of showing certain things to web crawlers but not the users. For example, using certain keywords, descriptions, and titles that do not pertain to the actual content of the website is considered cloaking. This tactic is just outright lying to search engines about the content of a website, and will easily lead to a website being banned.
==== Doorway Pages ====
A ''doorway page'' is a simple page that contains little content besides keywords and phrases. The purpose of such a page is to spam a search engine’s index with results. Often, these pages are misleading and will redirect the user to sites unrelated to the initial doorway. Most doorway pages are also stuffed with misleading keywords and phrases. If you have ever run into a page, and ran back into it while attempting to backspace out, it was most likely a doorway page, as these often have horrifying circular navigation. If such pages are reported, they can be permanently taken off the search results list.

{{http://i.imgur.com/x3vG4gl.png|Doorway page example|width=610}}
{{http://i.imgur.com/8rEdkAf.png|Doorway page example|width=610}}
==== Hidden/Non-Visible Text ====
	''Hidden text'' is exactly what it sounds like. It is pretty much the method used to do under-the-radar ''keyword stuffing''. Well, under-the-radar until the user highlights it and reveals how innately evil the site is. It fills a webpage with keywords that search engines recognize, but users (most of the time) do not. Search services such as [[https://www.google.com/?gws_rd=ssl|Google]] have been taking action against this; for example, if they notice that the color of some text is the same as the background, they rank the page lower.

{{http://i.imgur.com/88QswFS.png|Hidden text example|width=610}}

==== Link Purchasing/Link Exchange/Link Farming ====
	These three link tricks are all attempts to make a website appear more popular. Link farming is a tactic used by groups of websites that link to each other, fooling search engines into believing that those sites have many connections. Such circular links do not provide any traffic to the site itself. ''Link Buying'' and ''Link Exchange'' programs are tactics used between sites, where they purchase/trade links to get a link to the other website. The intention is similar to that of ''Link Farming'', where the site may appear more popular due to all the connections to other sites. Google specifically forbids excessive link exchanging.
==== Scraper Sites ====
	''Scraper Sites'' are evil sites that work, in theory, similar to ''Article Spinning''. Like article spinning, scraper sites aim to generate content for their site by scraping content from other sites (again, this proposes a [[http://www.plagiarism.org/plagiarism-101/what-is-plagiarism/|plagiarism]] issue). The sole purpose of this is to lure visitors to the site in order to click on the advertisements on it. In rare cases, scraper sites outrank the websites where the content originated from – however, if such cases show up, it is really easy for administrators to ban the site.
== White Hat SEO: Using SEO Correctly ==
	Now that we’ve learned about what ''Black Hat SEO'' – or, the abuse of search engine mechanics – is, and what it leads to, it is time to learn about methods that can be employed in order to help promote a site and make it friendly not only to search engines but to the people that ultimately will visit the page.

	Again, Black Hat SEO takes advantage of the search engine by any means possible for the sole purpose of reaching high ranking on pages, and targets the search engine rather than the user. White Hat SEO functions oppositely, but for the same purpose. The eventual goal of any site is to broadcast its content to as many people as possible, which brings us to our first point: accessibility.
{{ http://i.imgur.com/l1R3uXT.png|White Hat SEO|width=111}}
=== Optimizing Accessibility ===
	The first step of optimizing the accessibility of a webpage is making it “visible” to the most constrained viewer of all – the search engines themselves. A search engine does not have the capability to analyze certain elements, such as images or applications. If you have a web page that consists of little text and an application that does amazing things, so what? Search engines also do not care about how fancy your website is. All these visual decorations on your site are invisible to the search engine. Without text representation of these non-text elements, the search engine does not even know what you have to offer. Here is a representation of what you see verses what a search engine sees, courtesy of [[http://www.webconfs.com/search-engine-spider-simulator.php |webconfs’ Search Engine Spider Simulator]].
{{ http://i.imgur.com/ZM4Ttka.png|What You See vs What the Search Engine Sees|width=610}}

Do not take text representation too literally – believe it or not, search engines cannot even interpret all text. What is an example of this? ASCII art and formatting. Search engines are intelligent in their own ways, but there is a lot of beauty in websites that only the human eye can perceive.

{{ http://th02.deviantart.net/fs71/PRE/f/2013/240/e/e/ascii_art_my_little_pony_by_rwmcfa1-d6k1ks9.png|ASCII art example|width=405}}

Aside from making sure that there are text components of such elements, you need to make sure that your website is fully functional even if applications, scripts, or other elements are disabled. ''Not all browsers support these functions, and some users tend to disable applications in their preferences. '' If a webpage depends on such scripts and whatnot to function, and they do not for whatever reason, this will fare poorly for the website. It can lead to your website receiving a low ranking, or it may never be indexed in the first place.

A good way to check if your website is functional without fancy programmatic applications is by using a [[http://en.wikipedia.org/wiki/Text-based_web_browser |text-based web browser ]] such as [[http://www.gnu.org/software/emacs/ |Emacs ]] to look over your site. If you can’t see all the prominent features of your site, that is a sign that you need to do some adjusting as the skeleton of your site that you are looking at is pretty much what search engine crawlers are seeing. 

Ensuring that your site is viewable by web crawlers is good, but do not forget about the users themselves; making the site convenient to use is an obvious, and integral, part of user accessibility. Remember: ''the more accessible a site is to its users, the more accessible it is to a search engine. '' Below is a list of guidelines (that nearly seem like common sense!) to make while considering the accessibility of your site:

* '''''Be concise.''''' It is very important (as we will see in the next section regarding  ''Anchor Text'') to have a descriptive and relevant title, as well as alternate text and tags.

* '''''Be simple.''''' Make sure your site has a clear structure. Users should not have to follow multiple links to reach a desired page. 

* '''''Be organized.''''' You need a way to sell all your important information on your website to the web crawler. The ideal way to do this is with sitemaps; an [[https://www.xml-sitemaps.com/|XML sitemap ]] is good for feeding information to web crawlers, and an [[https://xmlsitemapgenerator.org/|HTML sitemap ]] is good for assisting users in finding specific pages on the webpage, as it essentially an index for your website.

* '''''When in doubt, use text.''''' Search engines can’t see text in images, or any other form of media or application. Using text in the most important places ensures that the search engine can evaluate the content on your site in the most complete extent.

=== Anchor Text ===
	''Anchor text'' is regarded as one of the most powerful tools in modern searching. What exactly is anchor text? It is simply the text to appear behind links. For instance, refer to the below link tag:
{{{
<a href = http://www.google.com>An Awesome Search Site</a>
}}}
In this example, the anchor text of the link to Google is “An Awesome Search Site”. Why is this so important? Link relevancy, a very powerful metric for determining rankings on search engines, is determined by not one, but two things: the anchor text, and the content of the source page. Search engines look at all the different anchor texts of site links to determine what kind of content is on them, and effectively, what queries it would rank high with. 

''Something important to remember is that, when looking at anchor text on a web page, if two links are targeted towards the same URL, the search engine only considers the first anchor text. ''

When Google released the [[http://en.wikipedia.org/wiki/Google_Penguin |Google Penguin]] update, keywords in anchor text became more closely examined. Specifically, Google was looking for different anchor text strings – by seeing the same anchor text strings appear again and again, a flag was put down to mark such keywords as “suspicious”. ''The most viable way to get better results is to go for a varied assortment of anchor texts. '' Remember – while trying to expand and get more inbound links and relevant anchor text, do not resort to the Black Hat strategy of link farming or link exchanging, because it will hurt you more in the end!

Another reason why anchor text is so powerful is because in most cases, people tend to use the title of the webpage or domain for it. For this reason, if you are working on a site, it would be in your best interest to include big, relevant keywords that you want to rank high in, as the domain name, title name, or both.

{{ http://i.imgur.com/KpVjS6I.png |Domain/Website title example|width=289}}

=== Keywords ===
	Keywords are a huge component of SEO. They are the words and phrases that get visitors to your site, and make up the very queries that you want to rank high for. Understanding how to optimize keywords is essential to making it to the top of the search results, whether you are a website designer, Youtube-r, blogger, etc.
==== Keyword Demand ====
{{ http://d1avok0lzls2w.cloudfront.net/img_uploads/search-demand-curve%281%29.gif |The Search Demand Curve|width=596}}
	The above picture represents the search demand curve. The “fat head” includes the most popular searches, where the “long tail” represents the most numerous searches, with the “chunky middle” being that area in between. So what does this all mean? 
>From an outside perspective, it might seem like the best idea to rank the highest on the most popular keyword – that is, something on the “fat head” side. However, the graph shows that the “fat head” searches, while they ''are'' the most popular searches, account for less than 20% of the total amount of keyword searches. The “long tail” is where the money is at – it contains such a large amount of unique searches, and collectively makes up the bulk of the world’s search demand.  Research has shown that the “long tail” searches, individually, may even be of more “value” than “fat head” searches – this is simply because “fat head” searches are so broad – that is particularly why they are popular searches. For instance, a user searching for a “fat head” term as simple as “food” is probably prospective at most, while a user looking for a more specific, “long tail” term, such as “nearby cheap sushi buffet”, is a lot more valuable of a user to businesses! 

{{ http://i.imgur.com/1qgEXh6.png |Picking the Right Keywords; Quality Over Quantity|width=267}}

==== Keyword Choice ====
Now that we’ve looked at keyword demand and figured out a general idea of what we want to use for our keywords, what strategies are there for designing effective keywords? There are many tricks; some are detailed in these guidelines below.

* '''''Be specific.''''' As we have seen in the keyword demand section, “long-tail” keywords account for the largest percent of the world’s searches. As such, try and make your keywords as specific as possible so they can reach out to the intended audience the easiest. “Storage” is a very “fat head” term; we also do not know if it is referring to digital storage or physical storage. “Intel matrix storage manager” – now that’s the kind of specificity we’re talking about.

* '''''Repeat, repeat, repeat.''''' You want be very repetitive with your keywords. It sounds like a tedious task, but yes, it needs to be done – and not even all by hand. A lot of tools have been developed specifically for the purpose of finding different variations of keywords. Never settle for just “hard drive failure” – make sure you have keywords for “hard drive malfunction”, “hard drive recovery”, and even “hard drive not working”, as well! This is essential as these variations assist marketers in expanding keyword portfolios. 

* '''''Start with keywords, don’t end with them.''''' It is a hassle to go back through your website and try inserting keywords all over the place in the end. Often, it is hurried, and can clearly show. If you design a website around certain keywords, everything that you make will revolve around these keywords, and you won’t have to shove them in forcibly.

* '''''Use analytics.''''' Just do it. [[http://www.google.com/analytics/ |Google Analytics]] is simple and free to use, and gives a lot of invaluable data to help you optimize your keywords. Traffic trends can show you which keywords are responsible for bringing in the most users, and you can use that information to determine what your website should focus on.

==== Keyword Placement ====
	“Great! I have a ton of keywords now? Where do I put them?” There are a lot of great places to place keywords. The most obvious place to place them is in the keywords Meta Tag in your website’s HTML code. As we’ve discussed earlier in the anchor text section, it is also smart to stick very strong keywords in your titles and domain name. Note that aside from those specific locations, search engines do take into account the first 200 words in a page. But watch out! Use keywords ''appropriately'' in the body of your website – relentlessly throwing out keywords will ultimately harm your website, as it is the Black Hat SEO method we discussed earlier known as ''keyword stuffing''.