Improving the Signal to Noise Ratio – Revisited

 

Additional thoughts about signals and noise that have been rattling around in my brain since first posting on this topic.

At the risk of becoming too ethereal about all this, before there is signal and before there is noise, there is data. Cold, harsh, cruelly indifferent data. It is after raw data encounters some sort of filter or boundary, something that triggers a calculation to evaluate what that data means or whether it is relevant to whomever is on the other side of the filter, that it begins to be characterized as “signal” or “noise.”

Since we’re talking about humans in this series of posts, that filter is an amazingly complex system built from both physiological and psychological elements. The small amount of physical data that hits our senses and actually makes it to our brains is then filtered by beliefs, values, biases, attitudes, emotions, and those pesky unicorns that can’t seem to stop talking while I’m trying to think! It’s after all this processing that data has now been sorted according to “signal” (what’s relevant) and “noise” (what’s irrelevant) for any particular individual. Our individual systems of filters impart value judgments on the data such that each of us, essentially, creates “signal” and “noise” from the raw data.

That’s a long winded way to say:

data -> [filter] -> signal, noise

Now apply this to everyone on the planet.

data -> [filter 1] -> signal 1, noise 1

data -> [filter 2] -> signal 2, noise 2

data -> [filter n] -> signal n, noise n

As an example, Google (itself a filter) is a useful one. Let’s assume for a moment that Google is some naturally occurring phenomenon and not a filter created by humans with their own set of filters driving what it means to create a let’s be evil good search engine. To retrieve 1,000,000 pieces of information, my friend, Bob, entered search criteria of interest to him, i.e. “filter 1.” Maybe he searched for “healthy keto diet recipes”. Scanning those search results, I determine (using my “filter 2”) 100% of the search results are useless because my filter is “how do i force the noisy unicorns in my head to shut the hell up”. The Venn diagram of those two search results is likely to show a vanishingly small set of relationships between the two. (Disclaimer: I have no knowledge of the carbohydrate content of unicorns nor how tasty they may be when served with capers and a lemon dill sauce.)

Google may return 1,000,000 search results. But only a small subset is viewable at a time. What of the rest of the result set that I know nothing about? Is it signal? Is it noise? Is it just data that has yet to be subjected to anyone’s system of filters? Because Google found stuff, does that make it signal? Accepting all 1,000,000 search results as signal seems to require a willingness to believe that Google knows best when it comes to determining what’s important to me. This would apply to any filter not our own.

All systems for distinguishing signal from noise are imperfect and some of us on the Intertubes are seeking ways to better tune our particular systems. The system I use lets non-relevant data fall through the sieve so that the gold nuggets are easier to find. Perhaps at some future date I’ll unwittingly re-pan the same chunk of data through an experienced-refined sieve and a newly relevant gem will emerge from the dirt. But until that time, I’ll trust my filters, let the dirt go as noise, and lurch forward.


Photo by Nathan Dumlao on Unsplash

The Shopping Mall in Your Inbox

Ian Bogost writes in the Atlantic:

It feels like every company and organization I’ve ever transacted with sends me email every week. Some every day, even. Some multiple times a day.

I see this in action with my wife’s “free” email account hosted by one of the major players in the email provider space. She has another email account that sees none of this. Not because that address is unknown but because it’s hosted on a private email server that I run. Nothing gets through that isn’t specifically allowed by the postfix/spamassassin filters and lists. It’s quite simple, really. If either of us wants to do business with a company that requires an email, I create an email alias with the company’s name as the username. That alias is then assigned as a value to a “whitelist_to” key for spamassassin and entered into an virtual aliases database table for postfix to know the actual email account where the message should be delivered.

If we start getting messages from companies that don’t match the username, we know the original company either sold their list or has been compromised. When that happens, the alias assigned to the “whitelist_to” key is reassigned to a “blacklist_to” key and the alias removed from the virtual aliases database table. We never see another message sent to that alias.

In a very few cases, this also means changing the email address registered with the original vendor. But more often than not, the alias was set up as a one-time purchase and can safely be excluded. I have hundreds of these aliases set up.

The major email providers could implement a system like this. But their business model relies on advertising revenue and such a system isn’t in their interests. The people who have signed up for a “free” email account are, after all, part of their product. And why would they listen to their product and not their paying customers.

Speculating about a perfect world, Ian continues…

The algorithms churn until you’ve interacted with enough promotional emails that every store you like delivers perfectly timed messages that cater to your every need and desire. Fulfilled, happy, you purchase every item advertised to you. Of course, this is not actually what happens. The irony of people’s supposed desire to receive emails from their favorite companies is that more than half of consumers in the United States and Canada say they receive too much promotional email. Personalization is supposed to make relevant messages get through and irrelevant ones falter. But what “relevance” means is constantly changing. If I need new pants, an apparel ad might be welcome. If I don’t, it’s just annoying.

This is the downside of a push system. My email server is more aligned with a pull system. If w need or want something, we go look for it, usually at some place we’ve shopped before. It’s a minimalist mindset thing. The personal habits around this approach greatly minimize impulse purchases. The lack of stuff in our life reflects the success of this strategy – we have no debt, drive 20 year old cars, and replace phones only when the apps we need no longer work because Android no long supports them.

Ian continues…

The fundamental reason marketing emails are clogging up everyone’s inboxes is that email marketing represents the collective outcome of a company’s interactions with its customers and the mailbox services; it does not, in other words, represent me.

Indeed. Virtually all the email marketing from a vendor is opt-out once you’ve given them your email address. Research suggests there would be significant shift in the amount of follow-on marketing if everything was opt-in.

Quoting Helena Tse, the vice president of marketing at Bonobos, Ian observes:

She also told me that the company “continuously explores how to better our toolkit to automate and optimize the frequency business rules.” A professional proclamation such as this might elicit nods at an industry conference, but it casts me in the role of a generator of data rather than a paying customer—let alone a living person who doesn’t need so many invitations to partake of Riviera shorts.

As I said, when you give your “free” email address to a business, you’ve become part of the overall product.


Photo by Matthew Henry on Unsplash

Story Points and Fuzzy Bunnies

The scrum framework is forever tied to the language of sports in general and rugby in particular. We organize our project work around goals, sprints, points, and daily scrums. An unfortunate consequence of organizing projects around a sports metaphor is that the language of gaming ends up driving behavior. For example, people have a natural inclination to associate the idea of story points to a measure of success rather than an indicator of the effort required to complete the story. The more points you have, the more successful you are. This is reflected in an actual quote from a retrospective on things a team did well:

We completed the highest number of points in this sprint than in any other sprint so far.

This was a team that lost sight of the fact they were the only team on the field. They were certain to be the winning team. They were also destine to be he losing team. They were focused on story point acceleration rather than a constant, predictable velocity.

More and more I’m finding less and less value in using story points as an indicator for level of effort estimation. If Atlassian made it easy to change the label on JIRA’s story point field, I’d change it to “Fuzzy Bunnies” just to drive this idea home. You don’t want more and more fuzzy bunnies, you want no more than the number you can commit to taking care of in a certain span of time typically referred to as a “sprint.” A team that decides to take on the care and feeding of 50 fuzzy bunnies over the next two weeks but has demonstrated – sprint after sprint – they can only keep 25 alive is going to lose a lot of fuzzy bunnies over the course of the project.

It is difficult for people new to scrum or Agile to grasp the purpose behind an abstract idea like story points. Consequently, they are unskilled in how to use them as a measure of performance and improvement. Developing this skill can take considerable time and effort. The care and feeding of fuzzy bunnies, however, they get. Particularly with teams that include non-technical domains of expertise, such as content development or learning strategy.

A note here for scrum masters. Unless you want to exchange your scrum master stripes for a saddle and spurs, be wary of your team turning story pointing into an animal farm. Sizing story cards to match the exact size and temperament from all manner of animals would be just as cumbersome as the sporting method of story points. So, watch where you throw your rope, Agile cowboys and cowgirls.


Image by Simona Robová from Pixabay

Agile Metrics – Time (Part 3 of 3)

In Part 1 of this series, we set the frame for how to use time as a metric for assessing Agile team and project health. In Part 2, we looked at shifts in the cross-over point between burn-down and burn-up charts. In Part 3, we’ll look at other asymmetries and anomalies that can appear in time burn-down/burn-up charts and explore the issues the teams may be struggling with under these circumstances.

Figure 1 shows a burn-up that by the end of the sprint significantly exceeded the starting value for the original estimate.

Figure 1

There isn’t much mystery around a chart like this. The time needed to complete the work was significantly underestimated. The mystery is in the why and what that led to this situation.

  • Where there unexpected technical challenges?
  • Were the stories poorly defined?
  • Were the acceptance criteria unclear?
  • Were the sprint goals, objectives, or minimum viable product definition unclear?

Depending on the tools used to capture team metrics, it can be helpful to look at individual performances. What’s the differential between story points and estimated time vs actual time for each team member? Hardly every useful as a disciplinary tool, this type of analysis can be invaluable for knowing who needs professional development and in what areas.

In this case, there were several technical challenges related to new elements of the underlying architecture and the team put in extra hours to resolve them. Even so, they were unable to complete all the work they committed to in the sprint. The the scrum master and product owner need to monitor this so it isn’t a recurrent event or they risk team burnout and morale erosion if left unchecked. There are likely some unstated dependencies or skill deficiencies that need to be put on the table for discussion during the retrospective.

Figure 2 shows, among other things, unexpected jumps in the burn-down chart. There is clearly a significant amount of thrashing evident in the burn-down (which stubbornly refuses to actually burn down.)

Figure 2

Questions to explore:

  • Are cards being brought into the sprint after the sprint has started and why?
  • Are original time estimates being changed on cards after the sprint has started?
  • Is there a stakeholder in the grass, meddling with the team’s commitment?
  • Was a team member added to the team and cards brought into the sprint to accommodate the increased bandwidth?
  • Whatever is causing the thrashing, is the team (delivery team members, scrum master, and product owner) aware of the changes?

Scope change during a sprint is a very undesirable practice. Not just because it goes against the scrum framework, but more so because it almost always has an adverse effect on team morale and focus. If there is an addition to the team, better to set that person to work helping teammates complete the work already defined in the sprint and assign them cards in the next sprint.

If team members are adjusting original time estimates for “accuracy” or whatever reason they may provide, this is little more than gaming the system. It does more harm than good, assuming management is Agile savvy and not intent on using Agile metrics for punitive purposes. On occasion I’ve had to hide the original time estimate entry field from the view of delivery team members and place it under the control of the product owner – out of sight, out of mind. It’s less a concern to me that time estimates are “wrong,” particularly if time estimate accuracy is showing improvement over time or the delta is a somewhat consistent value. I can work with an delivery team member’s time estimates that are 30% off if they are consistently 30% off.

In the case of Figure 2 it was the team’s second sprint and at the retrospective the elephant was called out from hiding: The design was far from stable. The decision was made to set aside scrum in favor of using Kanban until the numerous design issues could be resolved.

Figure 3 shows a burn-down chart that doesn’t go to zero by the end of the sprint.

Figure 3

The team missed their commit and quite a few cards rolled to the next sprint. Since the issue emerged late in the sprint there was little corrective action that could be taken. The answers were left to discovery during the retrospective. In this case, one of the factors was the failure to move development efforts into QA until late in the sprint. This is an all too common issue in cases where the sprint commitments were not fully satisfied. For this team the QA issue was exacerbated by the team simply taking on more than they thought they could commit to completing. The solution was to reduce the amount of work the team committed to in subsequent sprints until a stable sprint velocity emerged.

Conclusion

For a two week sprint on a project that is 5-6 sprints in, I usually don’t bother looking at time burn-down/burn-up charts for the first 3-4 days. Early trends can be misleading, but by the time a third of the sprint has been completed this metric will usually start to show trends that suggest any emergent problems. For new projects or for newly formed teams I typically don’t look at intra-sprint time metrics until much later in the project life cycle as there are usually plenty of other obvious and more pressing issues to work through.

I’ll conclude by reiterating my caution that these metrics are yard sticks, not micrometers. It is tempting to read too much into pretty graphs that have precise scales. Rather, the expert Agilest will let the metrics, whatever they are, speak for themselves and work to limit the impact of any personal cognitive biases.

In this series we’ve explored several ways to interpret the signals available to us in estimated time burn-down and actual time burn-up charts. There are numerous others scenarios that can reveal important information from such burn-down/burn-up charts and I would be very interested in hearing about your experiences with using this particular metric in Agile environments.

Agile Metrics – Time (Part 2 of 3)

In Part 1 of this series, we set the frame for how to use time as a metric for assessing Agile team and project health. In Part 2, we’ll look at shifts in the cross-over point between burn-down and burn-up charts and explore what issues may be in play for the teams under these circumstances.

Figure 1 shows a cross-over point occurring early in the sprint.

Figure 1

 

This suggests the following questions:

  • Is the team working longer hours than needed? If so, what is driving this effort? Are any of the team members struggling with personal problems that have them working longer hours? Are they worried they may have committed to more work than they can complete in the sprint and are therefore trying to stay ahead of the work load? Has someone from outside the team requested additional work outside the awareness of the product owner or scrum master?
  • Has the team over estimated the level of effort needed to complete the cards committed to the sprint? If so, this suggests an opportunity to coach the team on ways to improve their estimating or the quality of the story cards.
  • Has the team focused on the easy story cards early in the sprint and work on the more difficult story cards is pending? This isn’t necessarily a bad thing, just something to know and be aware of after confirming this with the team. If accurate, it also points out the importance of using this type of metric for intra-sprint monitoring only and not extrapolate what it shows to a project-level metric.

The answer to these questions may not become apparent until later in the sprint and the point isn’t to try and “correct” the work flow based on relatively little information. In the case of Figure 1, the “easy” cards had been sized as being more difficult than they actually were. The more difficult cards were sized too small and a number of key dependencies were not identified prior to the sprint planning session. This is reflected in the burn-up line that significantly exceeds the initial estimate for the sprint, the jumps in the burn-down line, and subsequent failure to complete a significant portion of the cards in the sprint backlog. All good fodder for the retrospective.

Figure 2 shows a cross-over point occurring late in the sprint.

Figure 2

On the face of it there are two significant stretches of inactivity. Unless you’re dealing with a blatantly apathetic team, there is undoubtedly some sort of activity going on. It’s just not being reflected in the work records. The task is to find out what that activity is and how to mitigate it.

The following questions will help expose the cause for the extended periods of apparent inactivity:

  • Are one or more members not feeling well or are there other personal issues impacting an individual’s ability to focus?
  • Have they been poached by another project to work on some pressing issue?
  • Are they waiting for feedback from stakeholders,  clients, or other team members?
  • Are the story cards unclear? As the saying goes, story cards are an invitation to a conversation. If a story card is confusing, contradictory, or unclear than the team needs to talk about that. What’s unclear? Where’s the contradiction? As my college calculus professor used to ask when teaching us how to solve math problems, “Where’s the source of the agony?”

The actual reasons behind Figure 2 were two fold. There was a significant technical challenge the developers had to resolve that wasn’t sufficiently described by any of the cards in the sprint and later in the sprint several key resources were pulled off the project to deal with issues on a separate project.

Figure 3 shows a similar case of a late sprint cross-over in the burn-down/burn-up chart. The reasons for this occurrence were quite different than those shown in Figure 2.

Figure 3

 

This was an early sprint and a combination of design and technical challenges were not as well understood as originally thought at the sprint planning session. As these issues emerged, additional cards were created in the product backlog to be address in future sprints. Nonetheless, the current sprint commitment was missed by a significant margin.

In Part 3, we’ll look at other asymmetries and anomalies that can appear in time burn-down/burn-up charts and explore the issues may be in play for the teams under these circumstances.

Agile Metrics – Time (Part 1 of 3)

Some teams choose to use card level estimated and actual time as one of the level of effort or performance markers for project progress and health. For others it’s a requirement of the work environment due to management or business constraints. If your situation resembles one of these cases then you will need to know how to use time metrics responsibly and effectively. This series of articles will establish several common practices you can use to develop your skills for evaluating and leveraging time-based metrics in an Agile environment.

It’s important to keep in mind that time estimates are just one of the level of effort or performance markers that can be used to track team and project health. There can, and probably should be other markers in the overall mix of how team and project performance is evaluated. Story points, business value, quality of information and conversation from stand-up meetings, various product backlog characteristics, cycle time, and cumulative flow are all examples of additional views into the health and progress of a project.

In addition to using multiple views, it’s important to be deeply aware of the strengths and limits presented by each of them. The limits are many while the strengths are few.  Their value comes in evaluating them in concert with one another, not in isolation.  One view may suggest something that can be confirmed or negated by another view into team performance. We’ll visit and review each of these and other metrics after this series of posts on time.

The examples presented in this series are never as cut and dried as presented. Just as I previously described multiple views based on different metrics, each metric can offer multiple views. My caution is that these views shouldn’t be read like an electrocardiogram, with the expectation of a rigidly repeatable pattern from which a slight deviation could signal a catastrophic event. The examples are extracted from hundreds of sprints and dozens of projects over the course of many years and are more like seismology graphs – they reveal patterns over time that are very much context dependent.

Estimated and actual time metrics allow teams to monitor sprint progress by comparing time remaining to time spent. Respectively, this will be a burn-down and a burn-up chart in reference to the direction of the data plotted on the chart. In Figure 1, the red line represents the estimated time remaining (burn-down) while the green line represents the amount of time logged against the story cards (burn-up) over the course of a two week sprint. (The gray line is a hypothetical ideal for burn-down.)

Figure 1

The principle value of a burn-down/burn-up chart for time is the view it gives to intra-sprint performance. I usually look at this chart just prior to a teams’ daily stand-up to get a sense if there are any questions I need to be asking about emerging trends. In this series of posts we’ll explore several of the things to look for when preparing for a stand-up. At the end of the sprint, the burn-down/burn-up chart can be a good reference to use during the retrospective when looking for ways to improve.

The sprint shown in Figure 1 is about as ideal a picture as one can expect. It shows all the points I look for that tell me, insofar as time is concerned, the sprint performance is in good health.

  • There is a cross-over point roughly in the middle of the sprint.
  • At the cross-over point about half of the estimated time has been burned down.
  • The burn-down time is a close match to the burn-up at both the cross-over point and the end of the sprint.
  • The burn-down and burn-up lines show daily movement in their respective directions.

In Part 2, we’ll look at several cases where the cross-over point shifts and explore the issues the teams under these circumstances might be struggling with.

Achieving 10x

Crossed paths with an old but still relevant conversation thread on Slashdot asking “What practices impede developers’ productivity?” The conversation is in response to an excellent post by Steve McConnell addressing productivity variations among software developers and teams and the origin of “10x” – that is, the observation noted in the wild of “10-fold differences in productivity and quality between different programmers with the same levels of experience and also between different teams working within the same industries.”

The Slashdot conversation has two main themes, one focuses fundamentally on communication: “good” meetings, “bad” meetings, the time of day meetings are held, status reports by email – good, status reports by email – bad, interruptions for status reports, perceptions of productivity among non-technical coworkers and managers, unclear development goals, unclear development assignments, unclear deliverables, too much documentation, to little documentation, poor requirements.

A second theme in the conversation is reflected in what systems dynamics calls “shifting the burden”: individuals or departments that do not need to shoulder the financial burden of holding repetitively unproductive meetings involving developers, arrogant developers who believe they are beholding to none, the failure to run high quality meetings, code fast and leave thorough testing for QA, reliance on tools to track and enhance productivity (and then blaming them when they fail), and, again, poor requirements.

These are all legitimate problems. And considered as a whole, they defy strategic interventions to resolve. The better resolutions are more tactical in nature and rely on the quality of leadership experience in the management ranks. How good are they at 1) assessing the various levels of skill among their developers and 2) combining those skills to achieve a particular outcome? There is a strong tendency, particularly among managers with little or no development experience, to consider developers as a single complete package. That is, every developer should be able to write new code, maintain existing code (theirs and others), debug any code, test, and document. And as a consequence, developers should be interchangeable.

This is rarely the case. I can recall an instance where a developer, I’ll call him Dan, was transferred into a group for which I was the technical lead. The principle product for this group had reached maturity and as a consequence was beginning to become the dumping ground for developers who were not performing well on projects requiring new code solutions. Dan was one of these. He could barely write new code that ran consistently and reliably on his own development box. But what I discovered is that he had a tenacity and technical acuity for debugging existing code.

Dan excelled at this and thrived when this became the sole area of his involvement in the project. His confidence and respect among his peers grew as he developed a reputation for being able to ferret out particularly nasty bugs. Then management moved him back into code development where he began to slide backward. I don’t know what happened to him after that.

Most developers I’ve known have had the experience of working with a 10x developer, someone with a level of technical expertise and productivity that is undeniable, a complete package. I certainly have. I’ve also had the pleasure of managing several. Yet how many 10x specialists have gone underutilized because management was unable to correctly assess their skills and assign them tasks that match their skills?

Most of the communication issues and shifting the burden behaviors identified in the Slashdot conversation are symptomatic of management’s unrealistic expectations of relative skill levels among developers and their inability to assess and leverage the skills that exist within their teams.


Image by alan9187 from Pixabay

Agile Team Composition: Generalists versus Specialists

Estimating levels of effort for a set of tasks by a group of individuals well qualified to complete those tasks can efficiently and reliable be determined with a collaborative estimation process like planning poker. Such teams have a good measure of skill overlap. In the context of the problem set, each of the team members are generalist in the sense  it’s possible for any one team member to work on a variety of cross functional tasks during a sprint. Differences in preferred coding language among team members, for example, is less an issue when everyone understands advanced coding practices and the underlying architecture for the solution.

With a set of complimentary technical skills it’s is easier agree on work estimates. There are other benefits that flow from well-matched teams. A stable sprint velocity emerges much sooner. There is greater cross functional participation. And re-balancing the work load when “disruptors” occur – like vacations, illness, uncommon feature requests, etc. – is easier to coordinate.

Once the set of tasks starts to include items that fall outside the expertise of the group and the group begins to include cross functional team members, a process like planning poker becomes increasingly less reliable. The issue is the mismatch between relative scales of expertise. A content editor is likely to have very little insight into the effort required to modify a production database schema. Their estimation may be little more than a guess based on what they think it “should” be. Similarly for a coder faced with estimating the effort needed to translate 5,000 words of text from English to Latvian. Unless, of course, you have an English speaking coder on your team who speaks fluent Latvian.

These distinctions are easy to spot in project work. When knowledge and solution domains have a great deal of overlap, generalization allows for a lot of high quality collaboration. However, when an Agile team is formed to solve problems that do not have a purely technical solution, specialization rather than generalization has a greater influence on overall success. The risk is that with very little overlap specialized team expertise can result in either shallow solutions or wasteful speculation – waste that isn’t discovered until much later. Moreover, re-balancing the team becomes problematic and most often results in delays and missed commitments due to the limited ability for cross functional participation among team mates.

The challenge for teams where knowledge and solution domains have minimal overlap is to manage the specialized expertise domains in a way that is optimally useful, That is, reliable, predictable, and actionable. Success becomes increasingly dependent on how good an organization is at estimating levels of effort when the team is composed of specialists.

One approach I experimented with was to add a second dimension to the estimation: a weight factor to the estimator’s level of expertise relative to the nature of the card being considered. The idea is that with a weighted expertise factor calibrated to the problem and solution contexts, a more reliable velocity emerges over time. In practice, was difficult to implement. Teams spent valuable time challenging what the weighted factor should be and less experienced team members felt their opinion had been, quite literally, discounted.

The approach I’ve had the most success with on teams with diverse expertise is to have story cards sized by the individual assigned to complete the work. This still happens in a collaborative refinement or planning session so that other team members can contribute information that is often outside the perspective of the work assignee. Dependencies, past experience with similar work on other projects, missing acceptance criteria, or a refinement to the story card’s minimum viable product (MVP) definition are all examples of the kind of information team members have contributed. This invariably results in an adjustment to the overall level of effort estimate on the story card. It also has made details about the story card more explicit to the team in a way that a conversation focused on story point values doesn’t seem to achieve. The conversation shifts from “What are the points?” to “What’s the work needed to complete this story card?”

I’ve also observed that by focusing ownership of the estimate on the work assignee, accountability and transparency tend to increase. Potential blockers are surfaced sooner and team members communicate issues and dependencies more freely with each other. Of course, this isn’t always the case and in a future post I’ll explore aspects of team composition and dynamics that facilitate or prevent quality collaboration.


Image by Gerd Altmann from Pixabay

Cook’s Theory of Performance Evaluation

The ideas presented here evolved from a post titled “Evaluate people at their best or their worst?” on John Cook’s blog. In order to make this post a little tighter, I’ll refer to John’s ideas as “Cook’s Theory of Performance Evaluation” and describe it as follows.

John identifies three ways a person’s performance can be evaluated:

  1. How good are they at their worst?
  2. How good are they on average?
  3. How good are they at their best?

Cook observes that schools evaluate performance using the first two benchmarks and markets use the third benchmark. To illustrate, consider the following assignment grades for a student in a hypothetical course:

Assignment Score
1 100
2 90
3 92
4 87
5 100
6 45
7 90
8 100
9 95
10 100
Average: 89.9
Result: B+

Graphically, this looks like:

The student’s worst performance pulls the grade average down and results in a B+ for the course. Performance evaluation in markets, however, is only interested in how well you do, that is, your best. Consider the following sales volumes for a fictional author for each of ten books:

Book Copies Sold
1 1,000
2 2,500
3 900
4 1,100
5 3,400
6 1,000,000
7 42,000
8 6,500
9 2,750
10 3,100
Result: Bestseller!

Graphically, this looks like:

Number 6 must have been some story. But as they say, you can’t argue with success and this author will forever be known as a bestseller. Subsequent flops won’t change that.

So there you have Cook’s Theory of Performance Evaluation. The consequences of this theory when played out in real life are noted by Cook:

We all want others to see the best in us. There are ethical and economic reasons to look for the best in others. But years of education can incline us to look for the worst in others and in ourselves.

Another point that can be made is that in school, everyone starts out with a perfect score that for most students erodes as the class progresses. In markets, everyone starts out with essentially a zero score that for most entrepreneurs improves over time, commensurate with the individual’s effort. Money, of course, is another way to keep score in market-based performance evaluations.

If education has conditioned us to look for the worst in others and ourselves, it has also conditioned us to become demoralized when encountering even the slightest failure that diminishes our chances at succeeding. Once lost, the perfect score can never be regained, so we settle for less. The greater the failure, the less we must settle for.

Moreover, we are conditioned that we can never exceed the highest possible achievement as defined by academia. The best we can do is match it. Most come up short. This conditioning is difficult to shake and in my own experience took several years after obtaining my undergraduate degrees. Nothing like 100+ job rejection letters to cause one to reevaluate the nature and size of the door opened by a couple of college degrees.

There are other ways to evaluate an individual’s performance.

  1. How good are they compared to others (past and present)?
  2. How good are they compared to themselves in the past?
  3. How good are they compared to their personal criteria and expectations?
  4. How good are they compared to the criteria and expectations of others?

The answer to each of these questions can be radically different depending on the referential index of the questioner. “How good am I when compared to others?” is significantly different from “How good is he/she when compared to others?”

The answers to each of these questions can also be significantly influenced by various biases and prejudices. Confirmation bias, hindsight bias, self-serving bias, the Dunning–Kruger effect, the misinformation effect, self-handicapping, self-fulfilling prophecies, introspection illusion, groupthink, the affect heuristic – numerous ways an individual can skew the evaluation of their own performance.

When the performance evaluation comes from a third party, for example a university professor evaluating a student’s performance, there are a different combination of biases in play which can have an independent impact on the performance score. The fundamental attribution error, confirmation bias, the illusion of transparency, credentialism or argument from authority – more ways the individual’s eventual performance score can can be unconsciously influenced. The combination of unconscious incompetence and the Dunning–Kruger effect can have a particularly adverse effect on the student who asks questions that expose a professor’s incompetence.

Here again, the level playing field appears to be with market-based performance evaluations. An individual’s ability to understand and mitigate biases and prejudices affecting their success will have a direct impact on their performance in the market. Students, however, have less influence over these drivers when they are manifest in individuals working from a position of authority.


Image by StockSnap from Pixabay