Today I needed to quickly find out a list of node's which did not contain a certain token (one we use on PPOnline to insert adverts into articles). There were meant to be 2 of these tokens per article but we had noticed some only had one. Some didn't have any at all!
Now, this might already exist as a neat function somewhere, but I couldn't find it. So I wrote this little query to find the number of occurrences of a substring in a larger block of text.
I did look into using the SUBSTRING() and SUBSTRING_INDEX() commands, however I couldn't see a way of efficiently using them to find any number of occurrences. So I settled on the following…
SET @findme="MySQL";
SELECT
n.nid, n.title,
CAST((LENGTH(nr.body) - LENGTH(REPLACE(nr.body, @findme, ""))) / LENGTH(@findme) AS UNSIGNED) AS findme_count
FROM node n
INNER JOIN node_revisions nr ON nr.vid = n.vid
WHERE n.type = "blog"
ORDER BY findme_count DESC;
The above example will list all nodes of type blog. It will then compare the length of the body before and after removing all instances of the string you're looking for. The result should be n * LENGTH(find_me)
where n
is the number of occurrences. This is why we divide by the length of the find_me
string.
This is a pretty fast query too. On our live database, it searched over 1,400 nodes in just over 31 milliseconds.
I hope this saves someone some time in the future!