Discovery verses Engineering

Two modes of work

I want to contrast two modes of work in R&D environments. The first is Engineering mode, and the second is Discovery mode. These modes of work often share similar rules of logic, rely on similar concepts and frameworks, and leverage the same scientific tools and technologies. Engineering and Discovery modes can rely on using the same language (and jargon). Engineering and Discovery modes often occur in similar environments, and even in the same person (albeit usually at different times). In education, these modes are lumped together: The Science & Engineering of STEM roughly map to the Discovery and Engineering modes I’m describing here.

A critical and defining difference between Engineering and Discovery modes, however, is revealed in the distribution of outcomes (or “typical results” and “surprises”) associated with these work modes. Others have argued that the defining difference lies in the flow of information (Drexler, see Review here), but I will focus on outcomes in this post.

For work conducted in Engineering mode, most outcomes (i.e., typical results) look like success. Software engineers produce software that works 99% of the
time. Bridge and aeronautical engineers produce bridges and airplanes that
work 99.999999% of the time. My numbers might be off, but the idea is that typical results are successful results, and moreover, typical results are often close to perfect.

For Discovery mode, the expectation of outcomes is the opposite. Most outcomes look like failure. Visualize gold mines and oil field explorations and drug discovery — where success happens 1%, 0.01% or 0.00001% (or less!) of the time. My point is that typical outcomes are failures and misses.

The asymmetry in typical outcomes between Engineering and Discovery mode activities is illustrated below.


The histograms illustrate the outcomes (“Payoffs”) for Discovery (top panel) and Engineering (bottom panel) modes respectively. Payoffs can be negative (red) or positive (black).  Surprise (Extreme and rare events) is either pleasant (Discovery) or catastrophic (Engineering). Where do you want to operate?


For Engineering mode (bottom panel), most outcomes are positive, but not that positive. When surprising events occur for Engineering mode, they are almost always surprising in a bad way and can be disastrous. Trains, plains and automobiles mostly run on time, except when they don’t – then they are almost always late.

For Discovery mode (top panel), most outcomes are negative, but not that negative. When your latest batch of compounds fails to have an effect on a cancer cell line, you might be disappointed, but you probably aren’t surprised. But for Discovery mode, when surprising events do occur, they are almost always surprising in a good way (jackpot!). The stream of misses and failures are explained as the cost of doing business.

An Antifragile aside


Discovery mode is where it’s at

The asymmetry between Engineering and Discovery modes maps to Nassim Taleb’s Sucker and Non-sucker categories, as described in his books The Blackswan and Antifragile. Here, Engineering is the Sucker and considered fragile because Engineering suffers from “tail events” (and increases in volatility), while Discovery is the Non-sucker and considered antifragile because Discovery benefits from tail events (and increases in volatility). Here, I’m referring to a mode of work. With respect to being an employee (see below), Taleb has argued that employment (dependence on a paycheck from others) is a fragile endeavor (for the employee).

Back to our story…

In larger companies, employees may often take on roles that are predominately either one mode or the other. Scientists/Engineers may often be trying to discover new insights or develop new technology, and thus spend most of their time in Discovery mode. Engineers, operators, and service staff, in contrast, may be working with well developed systems and operate for the most part in Engineering mode.

In smaller companies (or on individual projects) it isn’t atypical for an individual to transition between Engineering and Discovery modes in the course of working on a project.

To illustrate this point: I recently worked on a data science project that required both modes of operation. The first step was to formulate a question and hypothesis: This was the work of a (Data) Scientist and fit neatly into the Discovery mode. The next step was to design and build a system for capturing and storing data to explore and test the hypothesis. This involved programming a web scraper and database for storing data — and because this step relied on my deployment of well understood programming technologies (where I had a high certainty that I could accomplish the task, >90%), it fit neatly into the Engineering mode. This step culminated in the successful construction and operation of a system to capture and store the data needed to answer (or at least explore) the question of interest.  Next, and with the data in hand, I shifted back to Discovery mode and began to explore the data. I was looking for patterns and insights. Here, although I had some ideas about what I might (and hoped to!) find, I had no certainty in any particular outcome. The results of this step (thus far) were my findings (insights and ideas about the way the world works, and more importantly new questions for next steps of inquiry).

For this project, the Engineering and Discovery modes of work were both necessary. Was I successful? It depends: I am 100% certain that I built a system for capturing and storing the data to answer the question (The Engineering mode activities). The jury is still out about whether the findings from my Discovery mode activities were interesting (or important).

Where do you want to operate?

With respect to your job or occupation or career, and if you have a choice, where do you want to spend your time (and on what sorts of activities?) — on Discovery or Engineering?

If you are an employee at an R&D company, my answer is “it depends”. If you are an employee, you likely have a manager. And if your manager is a good manager, then she will recognize that there are differences between Discovery and Engineering modes and activities and most importantly outcomes, and set expectations, incentives and compensation to reflect those different modes accordingly.

If, however, your manager is blind to the asymmetry in distributions of outcomes I’ve described above, then the last thing you want  is to be miss-classified by him and subjected to the expectations (KPIs) and reward structure (incentives and compensation) of the alternate mode. This can be especially bad for Discovery mode activities (and failures) when subjected to the expectations that a manager might have for Engineering mode activities (and successes). (And this is why it is important to distinguish between Data Engineering and Data Science.)

The economic environment under which a company operates can also impact where you want to operate on the Discovery-Engineering spectrum.  In boom times, when there is money in the bank, when there is excitement from early wins, it’s great to operate in Discovery mode. If your role, however is Engineering focused, then you might grow resentful when others are lauded for their amazing discoveries. My recommendation is to find Discovery mode opportunities in your otherwise Engineering (or Operational) filled world of activities.

If the economic environment erodes, and you typically operative in Discovery mode, however,  look out! When budgets get smaller, deadlines tighten and the easy and early wins give way to missed goals (or losses), then humans become fearful and conservative. Here, managers may increasingly incentivize employees to act conservatively to reduce the riskiness of their activities (and implicitly reduce the volatility of their results). For the Discovery mode employee, this reduction in volatility is disastrous, because it caps their upside and prevents them from recognizing any truly extreme (and beneficial) payoffs. If you find yourself in this position, my recommendation is to look for Engineering mode activities (temporarily) or seek out new employment opportunities.


Variation matters

Engineering mode shuns uncertainty, because uncertainty may involve risk that corresponds to bad surprises. Discovery mode thrives under uncertainty, especially when a rare but beneficial result leads to finding something new, or a reduction of uncertainty in the face of making strategic decisions.

In summary, to understand the distinctions of Discovery and Engineering modes, one needs to have an appreciation for variation and the underlying distribution of outcomes expected while operating in each mode respectively. Without understanding the asymmetry in their outcome distributions, it would be difficult to convey how these work modes are different.

U-Haul pricing and where people are moving

By Jabus Tyerman (email:

A post by Mark Perry about U-Haul truck rental prices suggested that so many people were moving away from San Jose, CA that it was causing shortages of U-Haul rental trucks in San Jose. In response to high demand for rental trucks, U-Haul was adjusting one-way truck rental prices for leaving San Jose to be many multiples of the prices to move from the same cities to San Jose. For example, Perry reported that the price to rent a truck to move from San Jose to Las Vegas, NV was 16x more than the price to move from Las Vegas to San Jose.

Perry suggested that U-Haul would use dynamic pricing to optimize one-way truck rental prices in response to local supply and demand, and that we could use the ratio of Outbound moving prices compared to Inbound moving prices to infer net movement of people between pairs of cities. I wondered if we could use this idea to get a real-time measurement of movement patterns (“migration”) among other US cities?

Because Perry only reported on price imbalances between San Jose and six other cities, I wanted to start by looking more broadly at pricing imbalances across the US. I collected U-Haul pricing data for the 100 largest US cities (by population), with the intention of ranking cities for Outflow (and Inflow), based on average imbalance in truck rental prices.

I paired each focal city (“A”) with every other city (“B”), and used the U-Haul website to collect one-way pricing quotes to rent a 10′ truck to move from A to B (Outbound), and from B to A (Inbound). I calculated the log(Outbound/Inbound) for each city pair, and then used the average result of all city pairs (for each focal city) to generate an index of migration for that city. I called this index the U-Haul Moving Index, or UMI.

For example, using San Jose, CA as the focal city (A):  The Outbound and Inbound one-way truck rental prices between San Jose and Tucson, AZ (B) were $1271 and $161 respectively. The Outbound/Inbound is $1271/$161 = 7.9 (and taking the log of 7.9 yields 2.06). This value was calculated using San Jose as A, and all cities as B, and the median value (“UMI”) for San Jose was 0.69.

[I converted the ratio of Outbound/Inbound to a log scale because taking the natural log of a ratio transforms the asymmetric linear scale into a symmetrical log scale. In this way, an Outbound price that is 2x the Inbound price will have the same scaling as an Outbound price that is 0.5x the Inbound price, i.e. a factor of 2 in both cases.]


The main results are illustrated in Figures 1 and 2 below.

Figure 1 ranks cities according to the U-Haul Moving Index (UMI). Cities having positive UMI (purple) are cities where Outbound prices are greater than Inbound prices, suggesting that trucks are in short supply, due to a net outflow of people. Cities having negative UMI (orange) are cities where Outbound prices are less than Inbound prices, due to a net inflow of people.

Figure 1. Cities ranked by Outflow (and Inflow) using the U-Haul Moving Index (UMI). Positive UMI (purple) cities are cities where Outbound prices are greater than Inbound prices (on average), suggesting trucks are in short supply relative to demand, due to a net outflow of people. Negative UMI (orange) cities are cities where Outbound prices are less than Inbound prices (on average), suggesting trucks are in surplus relative to demand, due to a net inflow of people.

(Click here for a high resolution .pdf of Figure 1)

Figure 2 is a map of 100 cities, colored by U-Haul Moving Index (UMI), and highlights strong regional patterns in these data.

Regions having positive UMIs — dominated by people leaving the regions — include California,  Chicago (and surrounding Lake Michigan states), New York City (and north eastern seaboard states), and Miami, FL.

Regions having negative UMIs — dominated by people arriving in the regions — include the south eastern states, Texas & Oklahoma, Arizona, and Boise, ID.

FIGURE 2. Map of US cities colored by U-Haul Moving Index (size reflects population).

How do these data compare to other studies?

Previously, U-Haul has analyzed its own data to report on migration trends. U-Haul used  total number of one way arrivals in 2017 to rank cities as US destinations, and found that Houston, TX and Chicago, IL were the top two destination cities. In contrast, UMI (this analysis) ranked Houston #42 (of 100) inflow cities, and Chicago as the top outflow city after California cities (Chicago was ranked 15/100). Additionally, U-Haul’s total one way arrival method ranked San Jose, CA as #42 in its list of top 50 destination cities, and this study using UMI ranks San Jose, CA as tied for top outflow city (#1 of 100). These differences in ranking may reflect differences in methodology.  However, because data in these studies came from different time periods (2017 for one way arrival data, and June 2018 for UMI data) it is conceivable that differences in city rankings are due to underlying differences in migration patterns rather than methodology. Whether one method is more accurate at describing patterns of migration has not been determined. In my opinion, however, the total one way arrivals method used by U-Haul appears to ignore the numbers of one-way departures, and may therefore not present a full accounting of inflows and outflows required to calculate the net flow of people to/from the city. Rental pricing — if dynamically optimized in response to local supply and demand — could better integrate information about in- and outflows of people.

The company Redfin has used house search data to estimate the movement of people and ranked San Francisco, New York and LA as top outflow cities (with Chicago as #5) — a result more in line with the UMI rankings in this study. However, Redfin also identified Sacramento, Phoenix and Las Vegas as top Destination cities, while the UMI in this study strongly ranked Sacramento as an outflow city and Las Vegas as having more balanced in- and outflow of people. (Phoenix had a moderately negative UMI suggesting it was a moderate inflow city). As with the U-Haul total one-way arrival method, it is unknown whether differences in city rankings between Redfin and this study stem from differences in methodology or differences in moving patterns due to the data being captured during different time periods.

One practical advantage of using the UMI method described in this study over the U-Haul total one-way arrival method and the Redfin home search method, is that U-Haul prices required for UMI calculations are readily available from the U-Haul website, while one way arrivals are available only to U-Haul, and home search data is available only to Redfin.


I used imbalances in U-Haul pricing data to generate U-Haul Moving Indices (UMI) for 100 US cities. This work builds on the ideas of Mark Perry in order to generate (near) real-time estimates of net people flows (Out- and Inflow) for each city.

While these data may reflect near real-time patterns of migration, they do not provide explanations why people are moving to- and from cities. Others have argued that taxes, cost of living, etc. spur people to leave cities and move on where conditions are better.

One of the assumptions I made (as did Perry) is that price is dynamically determined by U-Haul based on local supply and demand of rental trucks. U-Haul may  use other factors to set truck rental prices.

Follow up

This is a work in progress and I may update this post over time. If you have feedback or questions, I’d love to hear from you.


Cities are top U.S. cities (in lower 48 states) by population, based on 2013 census estimates using data available here.

U-Haul prices were obtained for one-way truck rentals (10′ trucks) collected over ~2 days from the U-Haul web site ( in June 2018.


Thanks to Andy Idsinga for fruitful discussions and insights that motivated this work.


  1. The blog post and study that motivated this work: Mark Perry (Feb 2018) SF Bay Area experiences mass exodus of residents, leading to a shortage of U-Haul trucks and sky-high prices for scarce outbound trucks. AEIdeas Blog. Last accessed 2018-06-18 from URL:
  2. The U-Haul report on migration that used total one-way arrivals data: U-Haul (May 2018) U-Haul Migration Trends: Houston Ranks as No. 1 U.S. Destination. U-Haul Blog. Last accessed 2018-06-18 from URL:
  3. The Redfin study using home searches: Greg McCarriston (Feb 2018) Affordable Inland Metros Drew People from San Francisco, New York and Los Angeles. Redfin Blog. Last accessed 2018-06-14 from URL
  4. Blog post discussing explanations for people moving among cities (and states): Mark Perry (Feb 2018) . AEIdeas Blog. Last accessed 2018-06-18 from URL:


If you’re interested in analyses and graphs like the ones that appear in this article, then you might be interested in a new service I’ve started: Click here to visit Yukon Data Solutions.

Are we a good fit?

It’s important that there is a good fit between me and my clients. Here are some of my  thoughts on assessing goodness of fit.

In an ideal relationship, the client:

  • Is a leader in biotechnology or life sciences industry. She may not work for a biotech company, but she is involved or affiliated with the biotech industry in some capacity.
  • Is facing a key challenge involving his data, data systems, or work culture around data. He knows their company is not getting the most from their data, and that it is time for some strategy and further investment.
  • Is a decision maker and she is motivated to maximize the value of the solution during  our engagement. This is not necessarily the same thing as minimizing the cost of the solution during our engagement.
  • Recognizes that his situation is likely to differ from other situations, and will require some diagnosis on my part before jumping into a solution.
  • Is prepared to describe the problem or challenge she is facing.
  • Is not simply looking for a pair of hands to execute a self-prescribed task list.

And I can offer my best work when:

  • The engagement starts at high-level with strategy and advice, and transitions to the execution of customized data science deep dives, projects or analyses.
  • The client’s problem is best tackled by a combination of my background in experimental biology, my experience in synthetic biology, and my expertise in developing assays, workflows and data pipelines & systems to support decision making in industry.
  • The situation can benefit from an outside perspective, and a diagnosis that is not affiliated with specific enterprise solutions or products.
  • I use data to test and support a client’s intuitions about their science and process & business.

If these ideas resonate with you, then we just might be a good fit and you should get in touch.