Discussion


Note: Images may be inserted into your messages by uploading a file attachment (see "Manage Attachments"). Even though it doesn't appear when previewed, the image will appear at the end of your message once it is posted.
Register Latest Topics
 
 
 


Reply
  Author   Comment  
gilgongo

Registered:
Posts: 19
Reply with quote  #1 
I'm designing a dashboard for a computer systems control center which needs to display the status of three main types of information on a screen on a wall:

A: "Binary" states: things that are either "up" or "down" (and therefore in "alert")

B: "Analogue" states with alerts configured: things like memory use, where anything over a threshold for a certain time triggers an alert

C: "Analogue" states without alerts configured

In total, there may be about 500 things that will need to be monitored. About half would be either type "A" or "B".

Given so many thing to monitor in a limited screen space, and the need to avoid making the dashboard interactive, my first approach was to have the dashboard show only those of type "A" and "B" when they are in alert. Type "C" would display permanently, perhaps disappearing if they weren't in alert to give room for things that were.

When I mocked up this approach it made people think they'd be relying too much on the accuracy of the alert algorithms. They preferred to see everything all the time, whether or not anything was in alert, so as to form a "picture" of the state of the world. They also worried about the potentially negative effect of showing things in different positions depending on the state of other things.

So we've agreed that they'll simply have a large screen with a lot of things on it.

However, I'm curious as to what people here think about this idea. I even thought about the individual display devices being able automatically to grow into the available space to show more detailed information if they were able. A sort of "responsive dashboard", if you will.

Has anyone tried this? I can't immediately find any examples of such an approach. 




__________________
User Experience Designer - London, UK
PeterRobinson

Registered:
Posts: 33
Reply with quote  #2 
Hello, we have tried something similar and found that it was a difficult decision whether to show only those with alerts. If showing everything, even though the number of devices remains roughly constant over the short term, and the viewer gets used to particular things in a set place, that does change over the longer term as devices are added or removed, so the problem remains, though to a lesser extent than if only things with alerts are visible.

We found it useful to group the devices into classes, whether applications, type of devices, whatever, something meaningful, and sort them on something like utilisation, descending, for those that have that type of measure. This then shifts the order of the lists and so it doesn't matter whether it's showing everything or just the alerts. Maybe just show alerting plus the top-n after that?

One issue I have with showing only a binary on/off alert is that it gives no context. If it's on, it's on, but how long has it been on and what's the service level on it? How critical is it that it get immediate attention? What's the relative priority of it compared to others that might also be on?

I look forward to seeing where you go with this.



__________________
Peter Robinson
in Brisbane, Australia
danz

Registered:
Posts: 183
Reply with quote  #3 
Some quick thoughts.

Obviously a large volume of distinct information can be displayed on one limited screen only if is designed with a fixed layout.  An alert 
only dashboard would be difficult to follow. Might work a paginated design with automatic page changes.

Even if the screen allows high resolution, information has to be visible from few meters distance. A reference resolution to design a reasonable visible dashboard can be: 1366x768. 

To efficiently design so much information you need to use condensed representations or techniques. I would name just a few: Short labels or good abbreviations. Sparklines.
Tables with horizontal, vertical or wrapped layout. Histograms. Boxplots. Treemaps. Heatmaps. Maps ( virtual maps, building maps,...).

If it is possible, define and use significant aggregations for parameters. Aggregation can be time based or/and per group based. Robust statistics have to be considered, but also outliers might have sense.

A dynamic sorted table with a refresh rate of a few seconds can be very annoying (see task manager sorted by CPU). If the refresh rate is more than few minutes, a sorted table might be preferred. 

Where it has sense it worth to use smoothing algorithms or curve regression methods to approximate time variation and use the results in sparklines. These approximations can be also used to predict alerts.

Save the colors for alert levels only. Avoid icons.

 
gilgongo

Registered:
Posts: 19
Reply with quote  #4 
Thanks Peter - that chimes in pretty well with the feedback I've had on this idea. The grouping into classes is also a good point, which I'm working with the business to further understand. It needs to fit their "mental model" of their system overall before it can make sense to them. 

As to the issue of binary on/off indication, I have attempted to add some temporal context by having these not as the traditional "traffic light" blobs, but as horizontal "time lines" that show whether or not the system has been down, and for how long:

So for example, here is a system that is currently up, but was down for a short period 6 hours ago (so the time period being monitored is 12 hours in total):

uptime.jpg 

I also thought that if it went down, we could "box it out" until a technician had picked it up for investigation. So here is something that has gone down as of 17 seconds ago and has yet to be picked up:

emergent.jpg 

I'm hoping these can be stacked and compressed well enough such that we can get a large number on a screen.



__________________
User Experience Designer - London, UK
gilgongo

Registered:
Posts: 19
Reply with quote  #5 
Thanks danz - all good points to bear in mind.

For the record, my design principles (after talking with the intended users of the dashboard) are currently as follows:

1. Aim to fit as much of what we want/need to see in one place, without requiring interaction or other enforced state change.

3. Have several "modes" for the dashboard, each for a specific audience (eg manager, dev team, support desk). Each mode to display a different level of detail and/or sub-system.

4. Use devices that can easily scale up/down the amount of data encoded without losing clarity, according to screen space or "mode" above.

5. Have as few display devices as possible. Currently, about 4 seem enough able to display everything we might show.

6. Reduce clutter by not showing any numbers other than times (we agreed numbers are largely useless in this context).

I'll probably think up some others (currently toying with the idea of displaying "what to do now" info when something goes into alert, although that does break principle 1 if it involved interaction).



__________________
User Experience Designer - London, UK
jlbriggs

Registered:
Posts: 191
Reply with quote  #6 
Regarding your on/off line displays, I would strongly recommend muting the 'on' stage (or whichever the default state is), by making the line a medium grey color instead of the green.

If there are a large number of these in one display, you'll have a display that is cluttered with color that pulls the user's attention.
If the default is a neutral color, and the alert stage is red, it will be very easy to spot the problems and ignore what's not a problem.

With green/red, that task will instead be cognitively difficult.

gilgongo

Registered:
Posts: 19
Reply with quote  #7 
Thanks - excellent point. I'm going to engage the help of a visual designer later one when the structure has settled down a bit, so will be sure to point that out if the don't anyway. 
__________________
User Experience Designer - London, UK
danz

Registered:
Posts: 183
Reply with quote  #8 
Quote:
"time lines" that show whether or not the system has been down, and for how long


This can be better designed. For instance if you have 150 pixels reserved for 12 hours minigraph, 10 minutes issue will be sized to 2 pixels! There are parameters for which 10 minutes issues is nothing, but for some might be critical. More than that, some of malfunctions should be immediately highlighted, you cannot wait 15 minutes till the red line is ... 3 pixels to, maybe, see something.

Each parameter has to be careful analysed. For a single measure, frequency of issues, total duration, average, maximum or last duration, time elapsed since last issue, they all might mean something. A visible sign should appear if any of the statistics goes beyond predefined limits.

I would consider white background, a minimum visible size shape for alert and use that to signal any issue with a duration above minimum considered. I would probably use intense colors for most recent issues and a time scale changed every hour or two. I personally do not mind a bit of redundancy (color intensity encodes time elapsed since event, already encoded by time scale and label). Eventually, I could use gray tones for events have been already taken care of. I added an inactive parameter as well.

Something like below.

 Event Driven.png  

gilgongo

Registered:
Posts: 19
Reply with quote  #9 
I take your points, but the dashboard I'm designing needs to show the state of well over 100 systems on a single screen (or at least very few screens). Because of this, we've said that it doesn't need to be precise, just accurate. This, combined with the need to avoid overloading the devices with too many colours and states, means that we're going for a very simple approach. 

This is a sketch of the design as we currently have it, which I'm sure will evolve a bit (for example we were thinking about perhaps making tiles which are not displaying problems smaller than others).

In the example here, "Cluster config" has currently been down for a while, and was also down for a shorter time earlier in the 24 hour period. It also threw some errors previously for a long time. "Android App" meanwhile hasn't been down, but has been experiencing performance and errors. Other services have been fine. 

Note also the lack of any numbers or quantities, as we think that's not very useful here.


tiles.png    


__________________
User Experience Designer - London, UK
infinite8s

Registered:
Posts: 4
Reply with quote  #10 
Hi Gil,

Have you seen the LiveRac research project from the University of BC? They build a dashboard that needed to monitor hundreds of devices in real time with the ability to drill into any of them for historical analysis using a rubber sheet metaphor (watch the video to see). Here's the site (http://www.cs.ubc.ca/labs/imager/th/2006/McLachlanMscThesis/) and a demo video (http://www.cs.ubc.ca/labs/imager/video/2006/McLachlanMscThesis/liverac.mov).

Naveen
gilgongo

Registered:
Posts: 19
Reply with quote  #11 
Thanks infinite8s - I'd not seen LiveRAC. I did in fact work on a similar zooming approach for another dashboard I did several years ago though (for business metrics). 

One of the fundamental differences between a display like LiveRAC and what I'm designing is that my system must not demand interaction because it won't be interacted with. While there will be a very simple UI available for people who need to see more about an alert (click on a tile, get a popup window), it's basically an "information radiator" which then acts as a spur to action. The CEO might take rather different action to the operations manager though.

But where, how, and if you make a split between monitoring and forensics is a very tricky business. However, by adopting a principle of zero interaction, you have rather a lot of that decided for you :-) 

BTW we are also experimenting with voice alerts - which so far have proved to be pretty good at catching people's attention. I've also found that grouping the tiles into things that makes sense to the business (eg "public facing", "back room", "tools") is also effective in helping non-technical people get a handle on what the alerts mean in practice, even if they don't know what, for example, a "MySQL Cluster" is. 

__________________
User Experience Designer - London, UK
Previous Topic | Next Topic
Print
Reply

Quick Navigation: