Find a post...

DNN-Connect Blogs

An Essay on Auto-Converting Titles to Paths like Wordpress

When we create apps like blogs, articles or news we often need to generate a nice, SEO-style path for the details page, which should contain the title. This looks easy - I spent over a day on this simple challenge, and would like to share what I worked out…

Here's a super-short video explaining why this is important…

So basically the challenge is converting all kinds of titles to paths. Just replacing bad characters doesn't come close to delivering a useful solution. Let's look at some common issues:

  1. Umlauts - just killing them wouldn't be good as the word would get mangled - like "große Küchengeräte" which should result in a url like "grosse-kuechengeraete"
  2. Multiple "bad" characters, like "We +/- love this" would result in something like "We-----love-this"
  3. Leading / trailing spaces or special characters like "Learn Grunt (200)" should NOT result in "Learn-Grunt-200-"
  4. …especially in combination with path-characters (if you allow them) like "catalog/-best-mixer-ever"

I needed to get this worked out, because 2sxc 8.3.5 provides a new input-field called "string-url-path" which will auto-fill from one or more other fields. So the designer can specify it to fill from "[Title]" or more advanced cases like "[Category]/[Title]" and everything else must just happen.

If you're interested in the code, check out the JavaScript code on Github. But here's a short explanation to what I did:

  1. Before even starting, get the fields like Category, Title and remove slashes inside each. Reason is that the final result may have slashes (because category/name can have a slash), but if an inner piece also had a slash, this could cause trouble.
  2. Merge the result based on the mask (like [Category]/[Title]
  3. Lowercase everything
  4. Latinize everything - I created an Angular-Service which does this for me, converting around 1000+ "bad" characters like "áűőú" or "ǽ" to simpler characters. If you want to use it, you can find my AngularJS latinize-text-service here
  5. Neutralize apostrophe-s combinations like "Daniel's cat" to "Daniels cat" because I don't want it to end up as "daniel-s-cat" in the URL, but I also don't want to capture "she said 'super'" just because we have apos-s in a normal content
  6. Rotate all bad slashes \ to /
  7. Replace all unwanted characters including spaces with "-"
  8. Remove duplicate "-" and duplicate "/" in case they were created by previous conversions
  9. Replace all "-" and "/" side-by-side variations as they can easily be generated by previous conversions with simpler "/". This is to catch things like "(beta) Learn Gulp (200)" from resulting in a "blog/-beta-learn-gulp-200-" url
  10. And finally trim leading and trailing "-" characters

Usually you want to do this in JavaScript, because you want the UI to show the result immediately, and potentially allow the user to overwrite the resulting URL, ideally also trapping his input and preventing him from adding bad stuff. So feel free to use my code. Suggestions are also welcome.

Love from Switzerland,
Daniel


Daniel Mettler grew up in the jungles of Indonesia and is founder and CEO of 2sic internet solutions in Switzerland and Liechtenstein, an 20-head web specialist with over 800 DNN projects since 1999. He is also chief architect of 2sxc (see forge), an open source module for creating attractive content and DNN Apps.



Read more posts by Daniel Mettler
Daniel Mettler learned programming with the bible translation computer of his parents, deep in the jungles of Indonesia. Since he was only 12 years old at that time and the BIOS only had a version of BASICA, that's what got him started. With 16 he went back to Switzerland and learned German and basic city-survival skills. Equipped with this know-how he founded 2sic internet solutions in 1999 which was focused on web solutions on the Microsoft platform. After a few self-developed CMSs 2sic switched to DNN in 2003 and has been one of the largest partners (17 employees, 700+ projects) in Europe. Daniel is also the chief architect behind the open source 2sxc, a strong promoter of standardization (boostrap, patterns, AngularJS, checklists, etc.) and loves to eat anything - dead or alive. His motto: if the natives eat it, it game.
Comment(s)

Hosting liberally provided by

Geoff Barlow 513 4
Philipp Becker 5882 7
DNN-Connect 430 6
Peter Donker 4836 27
Christopher Hammond 659 2
Olivier Jooris 405 1
Daniel Mettler 11823 88
Clint Patterson 1 1
Jos Richters 65 1
James Rosewell 289 2
Will Strohl 1519 27
Ernst Peter Tamminga 407 4
Barry Waluszko 2411 2
Declan Ward 288 1
Gifford Watkins 721 9
Torsten Weggen 2270 3