Threshold for automation

When taking on a new role, you’ll inevitably have new tasks and responsibilities. Some of those tasks will be repeated, and at some point you might wonder whether automating the task would be more efficient than continuing to do the task manually.


For me, the threshold for automation typically happens after I’ve done a task just once. And knowing I may have to repeat a task actually influences how I undertake the task the first time. That is, while doing the task, I’m asking myself how I might change what I need to do right now, based on the likelihood that I would have to do the task 10 more times, or 1,000 more times.

I’ve worked in biotechnology companies where we would typically run an experiment, and after analyzing the results, decide to re-run the experiment —  perhaps with a minor tweak to the protocol, or using a different set of subjects or experimental conditions. In some cases, the insights derived from the experiments pushed us to make the experiment operational, that is, a regular and repeated part of our workflow. This meant that we would periodically run the experiment once a week, or once a month, to generate new data and glean new insights.

Whether the actual experiment could or should be automated would depend on the nature of experiment.

However, analyzing data from the experiment was almost always a candidate for automation, and if I knew I had to re-fresh the data and do the analysis again, I would always write R scripts to automate the data analysis and report generation. For me, I wanted to automate the parts of the workflow that took time, required attention to detail and precision, were well-defined, and were often mundane and boring and error prone.

Once achieved, the automation gives one the time to think about the message in the data, and the time to translate the insights from the data into value for the organization.

Once you’ve decided that a task — like data analysis — should be automated, you are faced with the decision of How you go about automating the process? The answer may depend on whether you and your colleagues have the skills to do the automation, or the capital to hire others to do the automation, or purchase a drop-in solution that enables automation (e.g., software, etc.).

As a grad student, the balance of time and money favored me choosing to learn the tools (in this case statistics, R,  python, and SQL) to do the automation myself.

However, in a business environment, learning these tools may not be the best option. Learning these tools takes a lot of time and effort, and that equates to an opportunity cost, i.e., your time and energy may be better spent applying your expertise and thinking about your problem domain. Remember that these tools ultimately result in a system that automates the task in question. This system can involve software (with source code), databases (with supporting architecture), web services (with access and security), and the time to address new issues (changes in requirements) as they arise in your process. With these considerations in mind, it may make sense to hire dedicated staff (if the process and system is large enough to warrant this choice), or purchase or subscribe to a software system that meets your automation needs.

Having said that, here’s a shameless plug for a new service we’re offering:

Yukon Data Solutions is our data analytics and report generation service. We work with biotechnology and life science teams to automate their data analytics pipelines. We focus on the rapid development and deployment of light-weight, customized automation solutions. Our goal is to take care of the painful parts of data analysis and report generation, so you can focus on the more important task of thinking about what it all means. Drop us an email if you want to determine if Yukon Data Solutions might be a good fit for your needs. One last thing: Our service is 100% satisfaction guaranteed.


Discovery verses Engineering

Two modes of work

I want to contrast two modes of work in R&D environments. The first is Engineering mode, and the second is Discovery mode. These modes of work often share similar rules of logic, rely on similar concepts and frameworks, and leverage the same scientific tools and technologies. Engineering and Discovery modes can rely on using the same language (and jargon). Engineering and Discovery modes often occur in similar environments, and even in the same person (albeit usually at different times). In education, these modes are lumped together: The Science & Engineering of STEM roughly map to the Discovery and Engineering modes I’m describing here.

A critical and defining difference between Engineering and Discovery modes, however, is revealed in the distribution of outcomes (or “typical results” and “surprises”) associated with these work modes. Others have argued that the defining difference lies in the flow of information (Drexler, see Review here), but I will focus on outcomes in this post.

For work conducted in Engineering mode, most outcomes (i.e., typical results) look like success. Software engineers produce software that works 99% of the
time. Bridge and aeronautical engineers produce bridges and airplanes that
work 99.999999% of the time. My numbers might be off, but the idea is that typical results are successful results, and moreover, typical results are often close to perfect.

For Discovery mode, the expectation of outcomes is the opposite. Most outcomes look like failure. Visualize gold mines and oil field explorations and drug discovery — where success happens 1%, 0.01% or 0.00001% (or less!) of the time. My point is that typical outcomes are failures and misses.

The asymmetry in typical outcomes between Engineering and Discovery mode activities is illustrated below.


The histograms illustrate the outcomes (“Payoffs”) for Discovery (top panel) and Engineering (bottom panel) modes respectively. Payoffs can be negative (red) or positive (black).  Surprise (Extreme and rare events) is either pleasant (Discovery) or catastrophic (Engineering). Where do you want to operate?


For Engineering mode (bottom panel), most outcomes are positive, but not that positive. When surprising events occur for Engineering mode, they are almost always surprising in a bad way and can be disastrous. Trains, plains and automobiles mostly run on time, except when they don’t – then they are almost always late.

For Discovery mode (top panel), most outcomes are negative, but not that negative. When your latest batch of compounds fails to have an effect on a cancer cell line, you might be disappointed, but you probably aren’t surprised. But for Discovery mode, when surprising events do occur, they are almost always surprising in a good way (jackpot!). The stream of misses and failures are explained as the cost of doing business.

An Antifragile aside


Discovery mode is where it’s at

The asymmetry between Engineering and Discovery modes maps to Nassim Taleb’s Sucker and Non-sucker categories, as described in his books The Blackswan and Antifragile. Here, Engineering is the Sucker and considered fragile because Engineering suffers from “tail events” (and increases in volatility), while Discovery is the Non-sucker and considered antifragile because Discovery benefits from tail events (and increases in volatility). Here, I’m referring to a mode of work. With respect to being an employee (see below), Taleb has argued that employment (dependence on a paycheck from others) is a fragile endeavor (for the employee).

Back to our story…

In larger companies, employees may often take on roles that are predominately either one mode or the other. Scientists/Engineers may often be trying to discover new insights or develop new technology, and thus spend most of their time in Discovery mode. Engineers, operators, and service staff, in contrast, may be working with well developed systems and operate for the most part in Engineering mode.

In smaller companies (or on individual projects) it isn’t atypical for an individual to transition between Engineering and Discovery modes in the course of working on a project.

To illustrate this point: I recently worked on a data science project that required both modes of operation. The first step was to formulate a question and hypothesis: This was the work of a (Data) Scientist and fit neatly into the Discovery mode. The next step was to design and build a system for capturing and storing data to explore and test the hypothesis. This involved programming a web scraper and database for storing data — and because this step relied on my deployment of well understood programming technologies (where I had a high certainty that I could accomplish the task, >90%), it fit neatly into the Engineering mode. This step culminated in the successful construction and operation of a system to capture and store the data needed to answer (or at least explore) the question of interest.  Next, and with the data in hand, I shifted back to Discovery mode and began to explore the data. I was looking for patterns and insights. Here, although I had some ideas about what I might (and hoped to!) find, I had no certainty in any particular outcome. The results of this step (thus far) were my findings (insights and ideas about the way the world works, and more importantly new questions for next steps of inquiry).

For this project, the Engineering and Discovery modes of work were both necessary. Was I successful? It depends: I am 100% certain that I built a system for capturing and storing the data to answer the question (The Engineering mode activities). The jury is still out about whether the findings from my Discovery mode activities were interesting (or important).

Where do you want to operate?

With respect to your job or occupation or career, and if you have a choice, where do you want to spend your time (and on what sorts of activities?) — on Discovery or Engineering?

If you are an employee at an R&D company, my answer is “it depends”. If you are an employee, you likely have a manager. And if your manager is a good manager, then she will recognize that there are differences between Discovery and Engineering modes and activities and most importantly outcomes, and set expectations, incentives and compensation to reflect those different modes accordingly.

If, however, your manager is blind to the asymmetry in distributions of outcomes I’ve described above, then the last thing you want  is to be miss-classified by him and subjected to the expectations (KPIs) and reward structure (incentives and compensation) of the alternate mode. This can be especially bad for Discovery mode activities (and failures) when subjected to the expectations that a manager might have for Engineering mode activities (and successes). (And this is why it is important to distinguish between Data Engineering and Data Science.)

The economic environment under which a company operates can also impact where you want to operate on the Discovery-Engineering spectrum.  In boom times, when there is money in the bank, when there is excitement from early wins, it’s great to operate in Discovery mode. If your role, however is Engineering focused, then you might grow resentful when others are lauded for their amazing discoveries. My recommendation is to find Discovery mode opportunities in your otherwise Engineering (or Operational) filled world of activities.

If the economic environment erodes, and you typically operative in Discovery mode, however,  look out! When budgets get smaller, deadlines tighten and the easy and early wins give way to missed goals (or losses), then humans become fearful and conservative. Here, managers may increasingly incentivize employees to act conservatively to reduce the riskiness of their activities (and implicitly reduce the volatility of their results). For the Discovery mode employee, this reduction in volatility is disastrous, because it caps their upside and prevents them from recognizing any truly extreme (and beneficial) payoffs. If you find yourself in this position, my recommendation is to look for Engineering mode activities (temporarily) or seek out new employment opportunities.


Variation matters

Engineering mode shuns uncertainty, because uncertainty may involve risk that corresponds to bad surprises. Discovery mode thrives under uncertainty, especially when a rare but beneficial result leads to finding something new, or a reduction of uncertainty in the face of making strategic decisions.

In summary, to understand the distinctions of Discovery and Engineering modes, one needs to have an appreciation for variation and the underlying distribution of outcomes expected while operating in each mode respectively. Without understanding the asymmetry in their outcome distributions, it would be difficult to convey how these work modes are different.