There are a lot of companies that benchmark web sites. You know this if you run a big site. Everyone is telling you how you rank. One reason for this is that people who run big web sites usually like data. But another reason is simply that there are several different ways to compare sites.
Some of the more meaningful ways to compare web sites include comparing things like engagement or conversion rates. Compete and ComScore are examples of companies with long track records for doing this. It's important data to have.
Other companies like Gomez and Keynote excel at benchmarking web application performance. Mostly this involves measuring things like page down load time, and other metrics about network latency and application delivery. Like Compete and ComScore, Gomez and Keynote have long track records of doing this type of benchmarking. Again, benchmarking performance can be useful.
Into the Fray
A few years ago we jumped into the web site benchmarking fray. As a usability company, we looked at what other companies were doing and started exploring how to benchmark site usability. After all, in a lot of industries usability is very hot. We thought there might be a market.
Standing on the Shoulders of Giants
Being usability geeks, we were aware of a long tradition of academic research looking into modeling how humans and computers interact. Modeling users in HCI has a long history going all the way back to the 70s.[1] We thought that this type of modeling might be useful for benchmarking,[2] since if you're going to compare web sites, comparing models of users ought to be interesting. Comparing models of user experience would allow you to pull out metrics about the sites that would be highly interesting. For example, your models should be able to show who has the easiest auto insurance rate quote process, or who has the best approach to online account opening in banking.[3]
A Little UX Modeling Theory
The path from insight to execution didn't turn out to be all that easy. It wasn't like we could simply read a few research papers and generate a working product as though one were reading instruction manuals. If only.
There were many twists and turns along the way. In the end we found the easiest way to explain our approach was to focus on two fairly well-understood concepts: qualitative and quantitative. Things that are quantitative (at least in this context) are things that we feel we can measure with a high degree of accuracy. Things that are qualitative are things that, although we work hard to be consistent and measure accurately, rely on human judgment to be defined. These items are labeled qualitative (again in the context of explaining our theory).
The last concept that turned out to be useful was that of events. Pretty much everything that happens in our models of experience is an event (when a key is pressed, a mouse is moved, and a word is read these are all events).
To go deeper on our approach to user experience modeling, download our whitepaper.
Model Building
There's theory and then there is applying it. Like with theory, we had to try out a handful of approaches to model-building before one stuck. One that was promising involved creating a special-purpose programming language which would describe the user experience of a device. There were some examples of people doing this in the academy. The first benchmarking research we released in 2005 relied on this technology. The problem was it was unwieldy largely because it was hard to tie the model back to the technology being tested. It was also pretty inefficient.
Simulation and Automation
After some thinking we came up with a new approach which worked much better. Instead of a special programming language we would build our models in a "visual environment". We would allow a special instrumented browser to capture what we could automate, and then graphically represent the model on top of a series of screenshots of the device in its various states. Human analysts could share these representations of their models. They were no longer unwieldy. Underneath, the software would do the heavy lifting of compiling the models, abstracting the metrics, and generating reports.
Is Simulation Real?
Inevitably when talking about simulation questions about reality emerge. How real are our models? For questions about methodological truth we refer people to the academic literature of HCI. For us the real question is: do the models produce insights that are non-obvious and useful? Based on the adoption of our approach to benchmarking by industry the answer is yes.
The real answer is deeper. We think that simulation in many ways is actually more real that real. To put it another way, simulation allows us to get at things you can't measure other ways. You can't say how many key-presses your order on Amazon takes, but if there are too many it affects how you feel.
Who Cares about Benchmarking?
That said not everyone likes benchmarking or wants it. For a while this was hard for us to believe. We thought every industry could benefit from it. It turns out that uptake has been the highest in highly competitive industries; where there is immense pressure to differentiate one site from another. Not all industries are like this. E-commerce for example really isn't, but financial services is. The amount of pressure has to be there. Caring about usability and caring about the kinds of metrics that usability benchmarking delivers turn out to be two different things.
References
- And it is still a lively area of research. Most cite the 1983 book The Psychology of Human-Computer Interaction (Card, Moran and Newell) as the ur text of user modeling.
- As far as we know, connecting user simulation with benchmarking web sites appears to be our contribution, although a key point of user modeling is to compare alternative designs.
- For us, "best" (especially in highly data entry intensive tasks) is the site that requires the least amount of effort. Support for tasks that are more qualitative, like first impressions of a site, or information gathering, are still "best" when they require low effort. But these tasks in our approach also must be highly informative and persuasive. We call this idea high "return on effort". For example in our latest study of bank web sites HSBC requires 68% more data entry than Ally in applying for a checking account online.