I read articles about UX evaluation methods on a daily basis, collected several UX guides on my desk and get my mailbox cramped by pricey software solutions. But it took me some off-the-job-training as an usability engineer to stumble upon an easy survey which even is scientifically constructed and completely free of charge. I’ve never read anything about the UEQ before this university training, so that’s why I would like to share the experiences I made doing benchmarking and tracking of the user experience over releases.
It works with semantic differentials — your users basically need to fill out a survey of 26 consisting of 26 contrariant adjective pairs. Those randomly ordered word groups represent six scales which are crucial for good UX (*Those scales combine some of the heuristics for interface design by Nielsen and add with novelty and stimulation more experience-driven items.):
Attractiveness rates the overall aesthetics of an application and how allured users are by it. Perspicuity shows how easy people understand the product and dependability gives an idea about it seeming trustworthy. The joy of use is measured within the stimulation scale and novelty represents how innovative a tool is perceived.
In order to get a broader feeling of your product’s qualities the UEQ provides you a score which rates the overall performance in:
Well, no… The most wonderful thing about the UEQ is that it comes with an Excel analysis tool. So there’s no need for being a pro in statistics or math.
I got some of my friends together for a usability test and asked them to play a game of 301 with each of the five apps I chose. During their games I observed and listened to what they have to say about the usability (think aloud method). Thanks to those usability tests I have got quite a few insights on what matters in those kind of apps and saved myself a little bit of work with the feature audit. After every game I gave out the UEQ validation sheets for each app.
As you see there are big differences in how the users rated the apps. Now you can compare those numbers to the design and interpret it. "Dartsmind" got better over all results in every group than any other app I benchmarked. The black lines indicate the maximum and minimum value given and you see all answers, even the relatively low once, are in the positive spectrum.
"Let’s Dart" on the other hand got very mixed reviews. Except from Efficiency and Dependability every other scale is negative. With some products you might not care about the other scales because being efficient and dependable might be the only thing that matters for you. But since we are talking about leisure apps, stimulation and attractiveness should count as well. Also the answers fluctuated dramatically — so it would be interesting to investigate which kind of users rated which scales positive and vice versa.
I just did the benchmarking with three participants. In order to get statistically significant results I would of course need to validate this app with more users again. But even with this fairly small sample it became obvious that "Dartsmind" performs pretty well with regard to the app’s design design since it got all positive values and not too big of a fluctuation in the answers between different users.
Director Cloud & Devops
Director Cloud & Devops
Director Cloud & Devops
Senior Solution Architect
Senior Solution Architect
Director Cloud & Devops
Director Cloud & Devops
Director Cloud & Devops
Director Cloud & Devops