Follow me: @D_mitar

Most read posts recently



Nov 12th 2009 × PHP debugging: breadcrumb all functions and their callees from anywhere

This is just a little snippet that I wrote whilst trying to debug some dodgy MYSQL calls that were causing problems. It allows you to backtrace all functions and files that have led to the problem, mapping it through a breadcrumb of sorts. REALLY helpful when tracing other people’s code also…

function get_callee() {
    // relies on debug_backtrace();
    $retstring = '';
    $backtrace = debug_backtrace();
    for($ii = 1; $ii < count($backtrace); ++$ii)
        $retstring[] = basename($backtrace[$ii]['file']) . ' - ' . $backtrace[$ii]['function'] . ' (' . $backtrace[$ii]['line'] . ')';
    return implode(" > ", $retstring);
} // end get_callee();

// use simply like so:
echo get_callee();

I hope it helps somebody

no

Feb 16th 2009 × E-commerce and product search algorithms: get relevant search results, not just random DB dumps

Everyone has been down that route – trying to make a good search. Most have failed…

The one thing that really annoys me when shopping around are searches on sites that give you irrelevant results. For instance, take a search for ‘black pack’ – I think you’d agree, a generic string with which I’d expect to get ‘backpacks’ and ‘daypacks’ and the likes, in black. For the experiment, I have chosen a site at random from google: gooutdoors.co.uk (see the search results yourselves here, in a new window). NB: I am in no way affiliated with gooutdoors.co.uk and this is not a dig at them or a link back for their site in any way, I have even applied a nofollow tag to the results link

When you had expected to see backpacks and got things like “Silva Ranger 3 Compass”, “Lifesystems HeadNet Mosquito Hat” and “Wayfayrer Beef Stew and Dumplings” (sic) instead, you can’t help but think something has gone terribly wrong with the search script. With over 70% of users likely to just ‘bounce’ off such a site after not being able to find what they were after immediately, we need to take a look into the why’s of getting such irrelevant search results.

Upon clicking on the Lifesystems Mosquito Hat from the above result set and scanning for the words ‘pack’ and ‘black’, we notice them within the product features:

  • Can screw down into a small “stuff pack

  • Ultrafine black mesh

Should these results have been displayed to me? No. Why do we get this problem? Lazy coding. The most basic search practice out there is to do something like:

1. Break string into words.
2. Compose the search query targeting known data fields like title, description, features, word by word, imploding into the query. At this point the where statement can look like ‘where (description like ‘%black%’ or features like ‘%black%’ or title like ‘%black%’) and (description like ‘%pack’ … etc etc)
3. Display the results and hope for the best.

There’s a marketing school of thought here that you’re better off displaying ‘something’ than no hits – but this is NOT how it’s done. Here is another favourite search of mine that works on the site above:
the this, Found 253 product(s) – page 1 of 22

It’s fair to say, certain words should not be used to score results, they are just too generic to be considered. Unless you are typing something like ‘the north face’, ‘the’ should be dismissed, in the same way as ‘this’ should be removed. In fact, over time — I have built a database of ‘bad keywords’ to drop from search strings that you can see as an appendix to this post.

So, what is the alternative? Oddly enough, I have found the most accurate search results are achieved via manual tagging and backed up by product knowledge. It goes like that:

1. Assign tags to each product. You can build an aliases table for common tags and errors. For example, you want to alias things like berghaus with berghouse, berghaus, burghaus etc (you’d be surprised how many people make mistakes).
2. Build the search algorithm to break down the string into parts and analyse them. Drop all common words that won’t help and keep the ‘useful’ bits only (See below for badwords)
3. What words you have left, treat as tags and fetch all products they have been applied to.
4. Refine for relevance. This is done by assigning a number score of hits on a product. Basically – If I search for Berghaus RG1 Jacket, that’s a possible 3 tagwords score. If the shop has the RG1 in stock, the listings should ONLY show me that result (or any other 3 point hits) and none of the results with 2 or less (jacket + berghaus). If the RG1 is not being stocked, this leaves an array of jackets by Berghaus and an array of jackets. Once again, go for relevance and show the first group of results only.

Advantages: always gets the right and relevant results, providing good product maintenance.
Disadvantages: needs to be managed, needs to be updated and there’s a need to monitor for people’s mistakes in searches and allowing for them.
Bottom line: The increased conversion ratio will justify the man hours put into tagging your product base. It’s a credit crunch, we all need to work harder!

I hope this gives you some ideas anyway.

Here is a suggested list of ‘bad words’ that can be safely dropped from search strings:

$badwords = array(
"a", "a's", "able", "about", "above", "according", "accordingly", "across", "actually",
"afterwards", "again", "against", "ain't", "all", "allow", "allows", "almost", "alone",
"along", "already", "also", "although", "always", "am", "among", "amongst", "an", "and",
"another", "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", "anywhere",
"apart", "appear", "appreciate", "appropriate", "are", "aren't", "around", "as", "aside",
"ask", "asking", "associated", "at", "available", "away", "awfully", "b", "be", "became",
"because", "become", "becomes", "becoming", "been", "before", "beforehand", "behind",
"being", "believe", "below", "beside", "besides", "best", "better", "between", "beyond",
"both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", "can't", "cannot", "cant",
"cause", "causes", "certain", "certainly", "changes", "clearly", "co", "com", "come", "comes",
"concerning", "consequently", "consider", "considering", "contain", "containing", "contains",
"corresponding", "could", "couldn't", "course", "currently", "d", "definitely", "described",
"despite", "did", "didn't", "different", "do", "does", "doesn't", "doing", "don't", "done",
"down", "downwards", "during", "e", "each", "edu", "eg", "eight", "either", "else",
"elsewhere", "enough", "entirely", "especially", "et", "etc", "even", "ever", "every",
"everybody", "everyone", "everything", "everywhere", "ex", "exactly", "example", "except",
"f", "far", "few", "fifth", "first", "five", "followed", "following", "follows", "for",
"former", "formerly", "forth", "four", "from", "further", "furthermore", "g", "get", "gets",
"getting", "given", "gives", "go", "goes", "going", "gone", "got", "gotten", "greetings",
"h", "had", "hadn't", "happens", "hardly", "has", "hasn't", "have", "haven't", "having",
"he", "he's", "hello", "help", "hence", "her", "here", "here's", "hereafter", "hereby",
"herein", "hereupon", "hers", "herself", "hi", "him", "himself", "his", "hither",
"hopefully", "how", "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if",
"ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", "indicated",
"indicates", "inner", "insofar", "instead", "into", "inward", "is", "isn't", "it",
"it'd", "it'll", "it's", "its", "itself", "j", "just", "k", "keep", "keeps", "kept",
"know", "knows", "known", "l", "last", "lately", "later", "latter", "latterly", "least",
"less", "lest", "let", "let's", "like", "liked", "likely", "little", "look", "looking",
"looks", "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", "merely",
"might", "more", "moreover", "most", "mostly", "much", "must", "my", "myself", "n", "name",
"namely", "nd", "near", "nearly", "necessary", "need", "needs", "neither", "never",
"nevertheless", "new", "next", "nine", "no", "nobody", "non", "none", "noone", "nor",
"normally", "not", "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off",
"often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", "onto", "or",
"other", "others", "otherwise", "ought", "our", "ours", "ourselves", "out", "outside",
"over", "overall", "own", "p", "particular", "particularly", "per", "perhaps", "placed",
"please", "plus", "possible", "presumably", "probably", "provides", "q", "que", "quite",
"qv", "r", "rather", "rd", "re", "really", "reasonably", "regarding", "regardless",
"regards", "relatively", "respectively", "right", "s", "said", "same", "saw", "say",
"saying", "says", "second", "secondly", "see", "seeing", "seem", "seemed", "seeming",
"seems", "seen", "self", "selves", "sensible", "sent", "serious", "seriously", "seven",
"several", "shall", "she", "should", "shouldn't", "since", "six", "so", "some", "somebody",
"somehow", "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", "soon",
"sorry", "specified", "specify", "specifying", "still", "sub", "such", "sup", "sure", "t",
"t's", "take", "taken", "tell", "tends", "th", "than", "thank", "thanks", "thanx", "that",
"that's", "thats", "the", "their", "theirs", "them", "themselves", "then", "thence", "there",
"there's", "thereafter", "thereby", "therefore", "therein", "theres", "thereupon", "these",
"they", "they'd", "they'll", "they're", "they've", "think", "third", "this", "thorough",
"thoroughly", "those", "though", "three", "through", "throughout", "thru", "thus", "to",
"together", "too", "took", "toward", "towards", "tried", "tries", "truly", "try", "trying",
"twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", "until", "unto",
"up", "upon", "us", "use", "used", "useful", "uses", "using", "usually", "v", "value",
"various", "very", "via", "viz", "vs", "w", "want", "wants", "was", "wasn't", "way", "we",
"we'd", "we'll", "we're", "we've", "welcome", "well", "went", "were", "weren't", "what",
"what's", "whatever", "when", "whence", "whenever", "where", "where's", "whereafter", "whereas",
"whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", "whither", "who",
"who's", "whoever", "whole", "whom", "whose", "why", "will", "willing", "wish", "with",
"within", "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", "yes", "yet",
"you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves", "z", "
);

Oct 3rd 2008 × A worthy PHP framework? Probably not but it is as close as anything else I have seen: codeIgniter

I have always regarded PHP frameworks that adhered to MVC standards and templating engines with inherent mistrust… No, that’s putting it too mildly. In fact, I really really really hated the idea. Let’s face it, somebody coming up with a legitimate way / reason to turn a simple page update on a site into a process that involves editing 4-5 files within the framework must have been paid by the hour… And the rest of the world must have been sleeping to accept this as the norm. I know in my production environment, where a quick reaction and turnaround is often critical, any delay is going to be considered a disaster (never-you-mind financial repercussions).

A typical example of a good idea gone horribly horribly wrong is eZ Publish – an award winning open source CMS system based on PHP, with it’s own framework and templating language. What-the-hell? Having actually worked in a company that dealt exclusively in producing eZ Publish websites and becoming a ‘certified eZ Now professional’, I can’t say I have ever seen anything so clumsy and awkward to develop for.
Hi, Tony Wood / VisionWT – I miss you guys!

Anyway, got sidetracked there… Was looking through some potential contract jobs today when I saw an ad for a PHP / AJAX coder using CodeIgniter. Since it pays 450 pounds a day, I actually made an effort and looked it up–and it turned out to be a WINNAR (anybody remember Jeff K?). I just finished watching the two excellent getting started video tutorials and it left me almost wanting to download CodeIgniter.

What set it apart? The ability to actually use inline PHP inside the view controllers and not have to learn some crappy inferior templating language. It means, if in a hurry, I can quickly code a hack and get it to work without too much messing around in a number of files (well, perhaps 2). Once happy with it and with the pressure off, I can move the change and optimise it in a ‘semantic’ fashion to my heart’s contempt…

It deserves a mention and here is a plug of their own features:

  • You want a framework with a small footprint.

  • You need exceptional performance.
  • You need broad compatibility with standard hosting accounts that run a variety of PHP versions and configurations.
  • You want a framework that requires nearly zero configuration.
  • You want a framework that does not require you to use the command line.
  • You want a framework that does not require you to adhere to restrictive coding rules.
  • You are not interested in large-scale monolithic libraries like PEAR.
  • You do not want to be forced to learn a templating language (although a template parser is optionally available if you desire one).
  • You eschew complexity, favoring simple solutions.
  • You need clear, thorough documentation.

If even half that list is true, we still have a PHP framework that’s worth remembering – just on the off-chance you ever need one.

no