Everyone has been down that route – trying to make a good search. Most have failed…
The one thing that really annoys me when shopping around are searches on sites that give you irrelevant results. For instance, take a search for ‘black pack’ – I think you’d agree, a generic string with which I’d expect to get ‘backpacks’ and ‘daypacks’ and the likes, in black. For the experiment, I have chosen a site at random from google: gooutdoors.co.uk (see the search results yourselves here, in a new window). NB: I am in no way affiliated with gooutdoors.co.uk and this is not a dig at them or a link back for their site in any way, I have even applied a nofollow tag to the results link
When you had expected to see backpacks and got things like “Silva Ranger 3 Compass”, “Lifesystems HeadNet Mosquito Hat” and “Wayfayrer Beef Stew and Dumplings” (sic) instead, you can’t help but think something has gone terribly wrong with the search script. With over 70% of users likely to just ‘bounce’ off such a site after not being able to find what they were after immediately, we need to take a look into the why’s of getting such irrelevant search results.
Upon clicking on the Lifesystems Mosquito Hat from the above result set and scanning for the words ‘pack’ and ‘black’, we notice them within the product features:
- Can screw down into a small “stuff pack”
- Ultrafine black mesh
Should these results have been displayed to me? No. Why do we get this problem? Lazy coding. The most basic search practice out there is to do something like:
1. Break string into words.
2. Compose the search query targeting known data fields like title, description, features, word by word, imploding into the query. At this point the where statement can look like ‘where (description like ‘%black%’ or features like ‘%black%’ or title like ‘%black%’) and (description like ‘%pack’ … etc etc)‘
3. Display the results and hope for the best.
There’s a marketing school of thought here that you’re better off displaying ‘something’ than no hits – but this is NOT how it’s done. Here is another favourite search of mine that works on the site above:
the this, Found 253 product(s) – page 1 of 22
It’s fair to say, certain words should not be used to score results, they are just too generic to be considered. Unless you are typing something like ‘the north face’, ‘the’ should be dismissed, in the same way as ‘this’ should be removed. In fact, over time — I have built a database of ‘bad keywords’ to drop from search strings that you can see as an appendix to this post.
So, what is the alternative? Oddly enough, I have found the most accurate search results are achieved via manual tagging and backed up by product knowledge. It goes like that:
1. Assign tags to each product. You can build an aliases table for common tags and errors. For example, you want to alias things like berghaus with berghouse, berghaus, burghaus etc (you’d be surprised how many people make mistakes).
2. Build the search algorithm to break down the string into parts and analyse them. Drop all common words that won’t help and keep the ‘useful’ bits only (See below for badwords)
3. What words you have left, treat as tags and fetch all products they have been applied to.
4. Refine for relevance. This is done by assigning a number score of hits on a product. Basically – If I search for Berghaus RG1 Jacket, that’s a possible 3 tagwords score. If the shop has the RG1 in stock, the listings should ONLY show me that result (or any other 3 point hits) and none of the results with 2 or less (jacket + berghaus). If the RG1 is not being stocked, this leaves an array of jackets by Berghaus and an array of jackets. Once again, go for relevance and show the first group of results only.
Advantages: always gets the right and relevant results, providing good product maintenance.
Disadvantages: needs to be managed, needs to be updated and there’s a need to monitor for people’s mistakes in searches and allowing for them.
Bottom line: The increased conversion ratio will justify the man hours put into tagging your product base. It’s a credit crunch, we all need to work harder!
I hope this gives you some ideas anyway.
Here is a suggested list of ‘bad words’ that can be safely dropped from search strings:
$badwords = array( "a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually", "afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone", "along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and", "another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere", "apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became", "because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind", "being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes", "concerning", "consequently", "consider", "considering", "contain", "containing", "contains", "corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described", "despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", "everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except", "f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for", "former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets", "getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least", "less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking", "looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely", "might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor", "normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed", "please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", "regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven", "several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon", "sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there", "there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough", "thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to", "together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value", "various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", "what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with", "within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", " );
[…] E-commerce and product search algorithms: get relevant search b…/b […]
[…] […]