Systems Funnelling

I would like to contribute a phrase to the scientific lexicon: 

Systems Funnelling | ˈsɪstəm fʌn(ə)lɪŋ | verb 

1 Whereby 1,000s of results from a “systems level” investigation (e.g. genomic screen) are reduced to a single causal target to appease the human cognitive bias for simple narrative explanations of complex processes. 

Hourglass Experiments

In the abstract sense, experimental systems contain two polar nodes: variable inputs and result outputs. 

In the simplest experiment, all conditions are constant and only a single variable is changed (e.g. +/- Stimulant X). This ensures that differences in the result output are exclusive products of the variable input.

But scientists are curious and impatient folk. They’re often interested in lots of things and have limited time to produce an answer. Consequently, they test multiple variable inputs concurrently (e.g. +/- Stimulant X, +/- Stimulant Y, +/- Stimulant Z). This is known as a multi-variate experiment. Presuming you can analyse the data, multi-variate experiments make sense. Concurrently testing multiple hypothesises is quicker than testing them one-by-one. 

What about the result outputs? The simplest experiment produces a single result output (e.g. +/- Signal). But again, inquisitive scientists want to know more. So they try to measure multiple result outputs concurrently (e.g. +/- Signal 1, +/- Signal 2, +/- Signal 3). I can’t find a communal term for this, so I’ll call this a multi-result experiment. Again, like their multi-variate cousins, multi-result experiments make sense. By collecting more results at once, you obtain faster insight regarding your hypothesises than if you measured each result one-by-one.

Thus, to get the theoretical maximum out of any experiment, we want to test lots of hypothesises (multi-variate) and collect as much data as possible (multi-result). Unfortunately, for many biological fields, this big-input, big-output combination is rarely possible. This is especially true in my own field: proteomics.

For example, consider a classic protein quantification method: the ELISA. A 96-well ELISA microtiter plate can host multiple variable inputs with multiple replicates per variable. You can test a lot of samples with an ELISA. However, ELISAs typically only measure one result output (e.g. a protein antigen). ELISAs have a large number of variable inputs and a small number of result outputs. They are “top-heavy” experiments.

Conversely, consider a SILAC mass-spectrometry experiment. "Light" and "Heavy" SILAC channels can only host one variable (with no replicates). However, LC-MS/MS analysis produce thousands of relative protein level result outputs. SILAC proteomic experiments have a small number of variable inputs and a huge number of result outputs. They are “bottom-heavy” experiments. 

Ideally, to perform multi-variate and multi-result proteomic experiments, we need a way to merge these respective “top-heavy” and “bottom-heavy” approaches. We need “hourglass” experiments.

So how do we develop "hourglass" proteomic experiments?

Collecting thousands of protein measurements via LC-MS/MS is extremely powerful. It makes sense to continue using LC-MS/MS for large result outputs. What we really need to do is increase the number of distinct variable inputs for LC-MS/MS analysis.

So what's stopping us?

Most quantitative proteomic experiments use labels to distinguish between variable inputs. SILAC (or CTAP) and isobaric peptide labeling (e.g. iTRAQ or TMT) support 2-3 and 6-10 variable inputs respectively. By individually labelling each variable prior to mixing, variable-specific technical variation does not bias the result output. Labelling controls variable-to-variable input bias – and this is extremely powerful. Unfortunately, as you can only test labelled variables, the finite number of labels technically limits the number of variable inputs. Labels strangle the number of inputs possible in bottom-heavy LC-MS/MS experiments. 

One approach to increase the number of input variables is to not use labels. To be label-free. Being liberated from labels means you can have as many input variables as you like! The huge problem with label-free quantification is that it is incredibly vulnerable to technical variation. Unless all sample-prep steps are uniformly consistent and reproducible, small technical errors can lead to large result output biases. This is big problem when using multi-step sample preparation such as is required for phosphoproteomics.

To address this issue, together with my colleagues at the ICR (and CRUK Manchester), I've been working on a method to uniformly enrich phosphorylated peptides for label-free phosphoproteomics. In short, we adapted typical phosphopeptide enrichment protocols to work with a 96-well particle handling robot. We call the method: Automated Phosphopeptide Enrichment (APE).

APE brings two salient advantages to phosphoproteomics:

1) A robot provides uniformity. Consistent, automated enrichment time after time. Every well is the same – every plate is the same. Result outputs are reproducible. This means we don't have to worry when using label-free quantification.

2) A 96-well plate provides lots of samples. Variable inputs are increased. It means replicates. And replicates mean statistics

When you combine reproducible result outputs with increased variable inputs you can start doing hourglass phosphoproteomic experiments. To demonstrate this, we tested the phosphoproteomic consequence of oncogenic KRAS (KRAS-G12D) in PDA cells. Using the multi-variate input provided by APE, we tested three cell lines, using three biological replicates and three technical replicates in one experiment (54 variable inputs). Using the multi-result capacity of LC-MS/MS, we quantified 5,481 phosphopeptides. That's a big-input, big-output, hourglass phosphoproteomic experiment. Crucially, this allowed us to identify a core panel of phosphosites that are statistically regulated by KRAS-G12D across all PDA cells. There are no anecdotal results. Some phosphosites we'd seen before but many were totally new. 

This little project has just been published over at Analytical Chemistry. If you're interested in multi-variate phosphoproteomics, take a look.

Proxy Particulars

Bayesian science is orientated by theory and guided by observation. Theory provides initial direction, but it is deliberate, continuous empiricism that chauffeurs theory to verity. This empirical dependency means scientists have to measure things. 

Unfortunately, it is not always possible to measure something directly. I can count how many trees are in a field unaided – but ask me to quantify something I can not see – such as the number of chlorophyll pigments in the field and I will need assistance. Help arrives as a “proxy”: something I can measure directly that correlates with the thing I’m interested in. For example, chlorophyll blocks light at a known wavelength. This disruption correlates with the abundance of chlorophyll. If I measure the light, I can quantify the chlorophyll. 

By measuring a proxy, we can quantify the invisible.

Scientists use established proxies everyday. Pre-defined, broadly-applied, quantifiable metrics. Where they come from and why we use them is rarely questioned. Proxies are subordinate tools.

But as science continues to prod the unknown, existing proxies occasionally start to fail. Proxies may not sufficiently correlate with a desired measurement. And if a proxy is inaccurate, it can not inform a hypothesis. When this happens, proxies – the lowly tools of analysis – require analysis themselves.

In my own field of proteomics, peptide abundance is used as a proxy for protein abundance. (It is much easier to measure peptides than whole proteins.) Proteomics is a nascent campaign – its face is firmly squashed against the unknown. Consequently, proteomic proxies need to be carefully scrutinised.

A big question for proteomics is: which peptides are the best proxies for proteins? 

A new paper published in Nature Methods (from Jonathan Worboys in my ICR lab) addresses this question empirically. Using the cancer kinome as a model system, Worboys et al demonstrate the first systematic evaluation of “quantotypic” peptides. The better a peptide correlates with protein level, the more “quantotypic” it is. Quantotypic peptides are good proxies.

The outcome? Quantotypic peptides can be (and should be) determined empirically. That is, the best proxies – those subordinate tools of empiricism – require individual empirical induction themselves. 

Meta.

A Lovely Pair

No cell exists in isolation. Every cell has a context. Cells are surrounded by nutrients, extracellular matrix, and crucially: other cells

I’m interested in how cells communicate with each other. Unfortunately, empirical analysis of cell-cell communication in a multicellular environment is limited. This is largely due to the technical difficulty in resolving between different cell lineages in a co-culture.

Here’s the problem: Proteins from one cell type are indistinguishable to those from a different cell type. Proteins from two different cells may have different origins – and therefore contain different information  but they are biochemically identical. This complicates the investigation of multicellular signalling because we have no spacial resolution of a spacial system.

For example, let's say we're interested in how cells in breast cancer communicate with each other. I have epithelial cancer cells and mammary fibroblasts. I mix the epithelial cells with the fibroblasts and leave them to exchange signals. I lyse the co-culture, perform a traditional western blot for my favourite signalling node and get a lovely bright band. Unfortunately, I can’t tell if the band came from the cancer cells or the fibroblasts. My western blot antibody can not resolve between the same signalling node from the two different cells types. I have no spacial contrast. 

What we need is a way to resolve between two different cell types. We need cell-specific data. 

Last year Gauthier et al (Nature Methods, 2013) reported a potential solution to this problem. The authors called it “Cell Type–specific labelling using Amino acid Precursors” (or CTAP). It’s very clever. 

CTAP uses non-mammalian amino acid processing enzymes to convert lysine-precursors into L-lysine. When expressed in mammalian cells, these enzymes convert their lysine-precursors into into L-lysine and the newly formed L-lysine is incorporated into every protein in the cell.

Now here’s the first clever bit: There are two enzymes. 

One enzyme (Lyr) converts a specific precursor (called D-lysine) into L-lysine. The other enzyme (DDC) converts a different precursor (called DAP) into L-lysine. Thus, we have a pair of enzymes capable of producing the same product from two different substrates. 

The second clever bit: The amino acid precursors can be labeled. 

D-lysine can be isotopically labeled “heavy”. The Lyr enzyme can then convert “heavy” D-lysine into “heavy” L-lysine. DAP (the other precursor) can remain unlabeled or “light”. As a result, the L-lysine produced from DDC is “light” and the L-lysine produced by Lyr is “heavy”. Both enzymes produce L-lysine but now we have isotopic resolution between the two outputs. We have contrast.

Let’s return to our breast cancer experiment.

This time, we express the DDC enzyme in our breast cancer cells and the Lyr enzyme in our mammary fibroblasts. The cells are then mixed together and grown in the presence of both “light” DAP and “heavy” D-lysine. In the breast cancer cells, our DDC enzyme converts “light” DAP into “light” L-lysine. As the breast cancer cells don’t have the Lyr enzyme, they can’t use the “heavy” D-lysine. As a result, all the proteins in the breast cancer cells are now “light”. Over in the fibroblasts however, Lyr is converting “heavy” D-lysine into "heavy" L-lysine. The fibroblasts don’t have the DDC enzyme so they can’t use the “light” DAP. Consequently, all the proteins in the fibroblasts are now “heavy”. We have “light” cancer cells and “heavy” fibroblasts.

This approach is conceptually similar to SILAC. The major difference is that whilst SILAC can be used to distinguish different cell types in transient co-cultures (see Jorgensen et al., 2009), CTAP permits the analysis of long-term, continuous co-cultures. 

Now we leave our cells to communicate. But this time  instead of running a western blot to investigate our favourite signaling node  this time, we use mass-spectrometry. The mass-spectrometer can easily resolve between the “light” and “heavy” proteins. As a result, we can easily distinguish between cancer cell proteins (they’re all “light”) and fibroblast proteins (they’re all “heavy”). We have cell-specific data. We have contrast. 

This concept is illustrated below:

CTAP uses cell-specific amino acid processing enzymes (DDC and Lyr) to isotopically label cells. When the co-culture is analysed by mass-sepctrometry (LC-MS/MS), we can easily distinguish between the proteins from each cell.

CTAP uses cell-specific amino acid processing enzymes (DDC and Lyr) to isotopically label cells. When the co-culture is analysed by mass-sepctrometry (LC-MS/MS), we can easily distinguish between the proteins from each cell.

It’s a great idea

I work on cell-cell communication in Pancreatic Ductal Adenocarcinoma (PDAC) and when I first read about CTAP it really excited me. As a protein biochemist I loved the idea of using enzymes to confer cell-specific labeling and at the ICR I’m privileged to have access to a mass-spectrometer capable of analysing CTAP experiments. 

So last year I decided to try CTAP for myself. I ordered some enzymes and transfected them into my cells. Unfortunately, at first, I couldn’t really get it to work. My DDC enzyme didn’t confer DAP-dependent proliferation and my Lyr enzyme kept flying out of the cells. In theory it should have worked – but the enzymes were letting me down.

CTAP could really change the way I investigate PDAC cell-cell communication so this was extremely frustrating. I knew the idea of CTAP was great but the pragmatic reality was disappointing. 

So I decided to come-up with a new pair of CTAP enzymes. I screened a panel of alternative enzymes and engineered the best ones to work more efficiently for CTAP. With help from some colleagues in the Jorgensen Lab, I tested these enzymes and found them to be excellent for CTAP. We ended up with a lovely pair of new CTAP enzymes. 

This little endeavor has just been published as an open access article in Molecular and Cellular Proteomics under the title: “Cell-Specific Labeling Enzymes For Analysis of Cell-Cell Communication in Continuous Co-Culture”. I've deposited the enzyme pair in AddGene so anyone interested in CTAP can use them. If you're interested in cell-specific labelling, take a look.

Metric-ocracy

Ideas are cheap – empiricism is expensive. At some point, someone needs to pay for science. 

There are vastly more researches than money for research. So that raises the question: Who gets the money? Surely the people who stand the highest possible chance of doing the best future work? It should be a meritocracy. But from the writhing pile of overeducated – how do funding bodies decide who is worthy of their money?

They need accurate proxies for future success. They need predictive metrics.  

Educational achievement is a traditional predictive metric. If someone was smart in the past, they will be smart in the future. Unfortunately, respected institutions produce so many graduates that alumni often share a common, indistinct qualification. We need a higher-resolution metric for future success. Something that enriches the genuinely-elite signal from the commonly-prestigious noise. We need personal metrics.

‘Narratively complete’ research projects are communicated by their authors as scientific papers. In a scientific utopia, peer-reviewed papers act purely to communicate findings between researchers. However, in the pragmatic, funding-constrained environment of contemporary science, publications are now used as personal metrics for success.

It makes sense: judge scientists by the narrative product of science. 

As with the university system, a hierarchy of locations exist to host each publication. There are ‘luxury’ journals (such as Nature, Science and Cell) and more humble alternatives. Journal prestige traditionally comes from a single metric: Impact Factor (IF).

IF is the average number of citations from a journal divided by the number of articles published by a journal. To game the IF system, savvy journals aim to publish highly cited articles and dismiss manuscripts that might not be cited as much. Despite being driven by subjective editorial policies, IF has become synonymous with scientific prestige. Both for luxury journals themselves and the authors who achieve a luxury byline.

If we’re looking for predictive proxies of an individual’s performance what does a high-IF publication tell us?

A ‘normal’ peer-reviewed publication indicates a researcher has narratively completed a research project. A ‘luxury’ peer-reviewed publication indicates a researcher has narratively completed a research project that IF-savvy editorial staff guess will be highly cited. The difference is not the quality of the science – but the subjective citability of the article. 

This is important because luxury publications are now required for career progression. Both with grant allocation and faculty hiring. Funding bodies and host institutions use luxury journal publications as precocious proxies for future impact. 

I see several problems with this approach. Firstly, whether an article get sent for external peer-review at a luxury journal (the first barrier to publication) is a decision often made by journalists – not active scientists. Their incentive is to keep IF high – not to publish the highest quality science. Secondly, for many funding bodies/institutes, one paper in a high-IF journal seems to be enough. You’ve got a single Nature/Science/Cell paper? Then you’ve produced something an IF-savvy editorial board deems citable. Climb over your peers. You’re in the club.

However, there is a difference between a paper in a high-impact journal and a high-impact paper. The real problem with lauding papers in luxury journals is that it presumes the former guarantees the later. It’s an indirect proxy used as a direct predictive metric. 

One big paper is not a pattern of success. It’s a huge achievement for any scientist – and one day I’d love to grace such a podium (mainly for the wide exposure) – but it’s a single datapoint. Probably even an outlier. Big papers often have a ‘right-place, right-time’ whiff to them and are influenced by transient fashion. Again there is nothing wrong with this at a personal level. Being fashionable in the right-place, at the right-time is commendable.

Extrapolating from a single, anecdotal datapoint is less admirable. It's an indirect proxy. And if scientists get angry about anything, it’s extrapolation from subjective data. Recently, there’s been a growing rebellion.

From the comfort of his pre-Nobel vantage, Randy Schekman boldly announced

“Journals like Nature, Cell and Science are damaging science.”

“I have now committed my lab to avoiding luxury journals, and I encourage others to do likewise.”

Easy to say from your Stockholm hotel – having already reaped the rewards of publishing in luxury journals. Still, Schekman’s piece caused a stir because it specifically claimed that in science: 

“The biggest rewards often follow the flashiest work, not the best.”

Of course it depends on how you define ‘best’. But Schekman’s argues that by using luxury publications as a prestige metric, the scientific community has outsourced quality-definition to journalists. And journalists don’t choose the best science:

“A paper can become highly cited because it is good science – or because it is eye-catching, provocative or wrong. Luxury-journal editors know this, so they accept papers that will make waves because they explore sexy subjects or make challenging claims. This influences the science that scientists do. It builds bubbles in fashionable fields where researchers can make the bold claims these journals want, while discouraging other important work, such as replication studies.”

To this end Schekman suggests we – as a scientific community – need to break the hold luxury journal editorial committees have over us:

“There is a better way, through the new breed of open-access journals that are free for anybody to read, and have no expensive subscriptions to promote. Born on the web, they can accept all papers that meet quality standards, with no artificial caps. Many are edited by working scientists.”

By only sending our work to open-access journals, Schekman believes we can circumvent luxury journal editorial biases and simply publish by merit. Perpetual open-access publishing is easy to propose whilst your Nobel Prize is being put in its display box. I’m not convinced junior researchers can risk this behaviour just yet. In the current climate it would be extremely dangerous for a junior researcher to send all their work to PLoS One and eLife

But it's an interesting idea. For it to work the scientific community also needs to shift focus from ‘single luxury papers’ towards ‘multiple high-impact papers’. Recently, there’s been an encouraging trend towards direct quantification for individual scientists. For example, Google Scholar curates all citations from an individual’s publication output. Here’s my page. I like the multitude of direct, personal metrics. For example, “h-index is the largest number h such that h publications have at least h citations”. h-index is a broad measure of an individuals citability. No weight is given to where a paper is published. Only citations. It’s not predictive impact from a single data-point. It’s actual impact from several data-points. 

So if we combine consistent open-access publishing (as proposed by Schekman) with personal output metrics (such as h-index), scientists could achieve an independent objective meritocracy. We're not there yet – but I hear Schekman-like rumination from more and more people.

I just hope institutes and funding bodies have an ear to this veering zeitgeist.

When those guarding the purse-strings can resolve between prestige and quality – researchers can simply aspire to quality.

I Don't Believe It

I think it's fair to say that scientists are considered ‘knowledgeable’. Definitive “Knowledge Workers” even. They spend years personally contributing to their chosen field and even longer consuming the conclusions of others. They obsess over niches and revel in esotericism. 

They ‘know’ their stuff and people 'believe' them. 

But the concept of ‘knowledge’ in science has always bothered me. If science is an iterative process – and is technically never finished – then claiming to infallibly ‘know’ something is scientifically impossible. Instead, for the scientific method to work, researchers must simultaneously accept that something can be true and it can be false. This is very different from 'knowing' something is true. It’s a chronic state of cognitive dissonance

The idea that science produces knowledge is an extremely common misconception. Science does not produce knowledge. Experiments produce evidence that in-turn, can be used to make predictions. The accuracy and power of these predictions creates the illusion of knowledge. 

For example, if I drop a ball and it falls to the floor I have evidence that there is a force directing the ball towards our planet. I can predict that next time I drop the ball, it will not fly upwards towards the moon, but will again fall towards the earth. I can use evidence to predict the future. The behaviour of the ball is currently explained by the curvature of space-time and we can use this model to accurately predict the movement of future objects. However, at no point do we truly ‘know’ space-time is curved – only that this interpretation of evidence currently permits accurate predictions of future behaviour. There might be a mischievous clan of pixies camped in the 11th spacial dimension that love dragging balls towards planets. We don’t know.

So if scientists can’t claim to know anything, why should anyone believe them? If we accept the pedantics of using exact language to invoke an idea, this is an interesting question. It’s interesting because to ‘believe’ something is to accept an idea without evidence. So to ‘believe’ in science is to accept the method of evidence driven prediction without observing the evidence for it. It’s to have faith in a process that does not accept faith. 

To host this paradox is to fundamentally misunderstand the scientific method.

Adam Blankenbicker (writing for PLOS Blogs) challenges this idea in his recent piece “Why I don’t believe in science…and students shouldn’t either”. Topping his post Blankenbicker provides a salient anecdote:

“I asked one of my colleagues at work, Dr. Briana Pobiner, a paleoanthropologist, “You believe in evolution, right?” I was surprised by how quickly she answered “I don’t believe in evolution – I accept the evidence for evolution.” The believing isn’t what makes evolution true or not, it’s that there is evidence that supports it.”

Kevin Padian, of the University of California, Berkeley adds:

“Saying that scientists ‘believe’ their results suggests, falsely, that their acceptance is not based on evidence, but is based somehow on faith.”

All good stuff. But what caught my eye was the focus on education. Blankenbicker rightly encourages “teaching the process of science, not the belief in science”. I’ve previously argued that teaching ‘scientific facts’ is very different from teaching the scientific method. The authoritarian focus of teaching facts without the supporting evidence gives a tutee no choice but to simply believe the conclusions of a tutor. 

If the resulting tutee 'believes' scientific 'knowledge' then they have fundamentally failed to understand what science is. 

Victor Meldrew had the right attitude

Parallel Cosmos

Carl Sagan's "Cosmos: A Personal Voyage" is widely considered to be the best science TV show ever made. Two years ago Fox (?!?!?!) announced a contemporary update was in the works. The new version was to be presented by Neil deGrasse Tyson and produced by Seth MacFarlane (?!?!?!?!?!?!). It sounded ridiculous. 

This morning we got our first look at "Cosmos: A Space-Time Odyssey": 

As expected, this update is unlikely to rival Sagan's masterpiece. The biggest clue comes in their respective titles. The new Cosmos is "A Space Time Odyssey". The trailer shows a bombastic, fantastical, crashing spectacular of cosmological grandeur. In contrast, Sagan's Cosmos was "A Personal Voyage". A subtle, gracious and articulate tour of our universe. For comparison:

A Transition To Transitions

Western blotting is probably the most applied method in contemporary cell biology. Invented in 1979 (see the original paper), western blotting is used to obtain the relative quantification of an individual protein from a complex mixture of other proteins. There are two main steps: First, separate the complex protein mixture by size (typically by SDS-PAGE), and second, detect the protein of interest using an antibody. It's a simple idea and it normally works. 

It's also an extremely analogue technique. It's low throughput, delicate and manual. It requires tweezers, precision, and up until recently - photographic film (seriously). My girlfriend - chronically aware of my culinary incompetence - was shocked to discover my proficiency at folding filo pastry. 10 years of western blotting had yielded a transferable skill: Vegetable Samosas

So it's fiddly and slow. But the real problem with western blotting is its dependence on biology. A blot is only ever as good as its primary antibody. Commercial antibody reagents are stochastically derived from an in vivo humoral immune response and each antibody is different. Some antibodies are great, many antibodies are crap. If you're working with a slightly obscure protein, the chances are your experiments will be limited by the availability of a good antibody. 

In his book “The Nature of Technology”, W. Brian Arthur notes that as technologies age, they become "stretched". That is, modest additions "stretch" the performance of technology towards gradual improvement. This is distinct from true innovation - whereby tools from an alternative "domain" facilitate a dramatic change in a technology (see this old post for more). The western blot is extremely stretched. We've had fluorescent detection, improved transfer procedures and micro-westerns. All modest additions that make western blotting a little better and provide the illusion that the technique is modern.

Western blots have served the scientific community well. Our current knowledge of cell biology is indebted to their competence. But despite modest technical improvements over the years, western blotting will always be dependent on stochastically derived antibodies of variable quality. It's the same problem Towbin et al faced in 1979.

So after 34 years isn't it time we transitioned from the western blot? 

Writing in MCP, Ruedi Aebersold certainly think so. As a pioneer of targeted mass-spectrometry Aebersold ponders: "Western Blots vs. SRM Assays: Time to turn the tables?" (edited for brevity):

"Recently, targeted proteomic methods, specifically selected reaction monitoring (SRM) have become prevalent. The method is conceptually similar to Western blotting. Both use assays that must be developed for each target protein to detect and quantify specific, predetermined (sets of) analytes in complex samples. However, the methods differ substantially in their implementation, the reliability of the resulting assays and the quality of results they produce. A Western blotting assay essentially depends on the specificity of the antibody used. In contrast, a SRM assay depends on multiple parameters, such as the retention time, the mass-to-charge ratio of the precursor ion and selected fragment ions of the targeted peptide and the relative signal intensities of the detected fragment (transition) signals. These values are then weighted and combined to derive a score that indicates the probability that the targeted peptide has been detected."

So SRM is a great technique. A method of the year even. But Aebersold's contention is not the application of SRM, but the perceived superiority of western blotting over SRM:

"Authors who submit papers containing quantitative protein data generated by MS are frequently asked by reviewers to validate some of the values by Western blotting. We believe with the advances that have occurred that this request is now outdated."

Independent of their respective merits, more people are familiar with western blots than with mass-spec data. Thus, when journal reviewers see mass-spec data, they sometimes ask for western blot 'validation' of the induced conclusions. They want researchers to use a worse technique to validate a better one. Think about that for a second. It's like asking a mathematician to check their results on an abacus incase their calculator is wrong.

Sentimentally isn't an easy bedfellow of rationality. Alas, the backlash: 

"We posit that the request to validate quantitative MS data by Western blotting is no longer justified. In fact, considering that the vast majority of protein identifications claimed from biological samples are still derived from Western blotting, it may be time to ‘turn the tables’ and to request that Western blotting results, or at least the assays that support these results, be validated by MS."

Better start clearing out the darkroom. The triple-quads are coming. 

Alternative Wet Labs

Most reachers spend the majority of their day in a lab or office. To be "at work" is to be running experiments, reading papers and on occasion, even writing them. Scientists work hard.

This empirical procession is occasionally punctuated with a break. During the week: A coffee in the canteen. Friday evening: A beer in the pub. What I'm starting to appreciate is that these places are not distinct from work, but an extension of the lab.

Writing for MRC Insight, Katherine Nightingale discusses "The Great Coffee Breakthrough":

"The LMB [Laboratory of Molecular Biology] has three allotted refreshment slots — coffee in the morning, lunch, and tea in the afternoon — where researchers and support staff are encouraged to get together."

Professor Marcus Munafò continues: 

“It’s pot luck who you end up talking to. Talking to people in different areas can sometimes generate a new direction for your research that you wouldn’t have thought of alone. It’s important to recognise the creative aspect of talking with colleagues – and science has a strong creative dimension to it.”

The MRC certainly believe "a few hours a week in the canteen can save many more hours in the lab." 

The ICR doesn't have an equivalent schedule but our lab does have "crossword time". At around 15:00 everyday the group assembles in the canteen ready to tackle a cryptic crossword. We do it as a break. For fun. For an opportunity to re-caffinate. But over time it's become less of a traditional 'break' and more of a forum. It's the one time every day where the lab get together and work on a common problem. Admittedly the problem usually involves anachronistic naval terminology, roman numerals and anagrams — but it's teamwork nonetheless. It's an excuse for everyone to talk, share thoughts and flaunt their lexicons. We may not be discussing science the whole time, but we leave the process a tighter social group than before we started. And in the collaborative world of modern research, it's valuable for a lab to understand the dialectical methods of its members. 

If the canteen is important for collaboration, the pub appears more suited to creativity. Mikael Cho writing for LifeHacker suggests we should "Drink Beer for Big Ideas, Coffee to Get Them Done". Maybe he's a sanguine dipsomaniac, but apparently:

"Researchers found that about five seconds before you have a "eureka moment" there is a large increase in alpha waves that activates the anterior superior temporal gyrus. These alpha waves are associated with relaxation—which explains why you often get ideas while you’re on a walk, in the shower, or on the toilet. Alcohol is a substance that relaxes you, so it produces a similar effect on alpha waves and helping us reach creative insights. Coffee doesn’t necessarily help you access more creative parts of your brain like a couple pints of beer."

So next time you're in the pub just remember: You're creatively preparing your brain for its next coffee break.  

Half Empty

Writing in PLoS One,​ Mobley et al discuss a worrying trend in data-reproducibility. Following a survey of MD Anderson Cancer Center faculty the authors conclude:

​"50% of respondents had experienced at least one episode of the inability to reproduce published data; many who pursued this issue with the original authors were never able to identify the reason for the lack of reproducibility; some were even met with a less than ‘‘collegial’’ interaction."

I'm surprised ​it's only 50%. Presumably the other half are far too busy doing 'pioneering research'.

It's a very small study but the results will feel familiar to many researchers. It can be hard enough to reproduce an old method from one's own lab book, let alone an entirely new finding from a group on the other side of the world. Small-scale (i.e. personal) technical infidelity is frustratingly commonplace. It's why researchers repeat their experiments. What's more worrying is the potential source of large-scale (i.e. post-publication) infidelity:

"Almost one third of all trainees felt pressure to prove a mentor’s hypothesis even when data did not support it."

Forcing data to fit a hypothesis is an egregious waste of time. It's a fundamental reversal of the data-driven hypothesis axiom and the literal opposite of the scientific method. It's fraud. Conclusions follow data, not the other way around. 

So what happens when the data doesn't fit the hypothesis? Bayesian inference dictates we re-calcultate the hypothesis in light of new data. Great in theory, but what if this happens 2-months before the end of a big grant and your future career depends on 'big' publications? This hasn't happened to me yet but the thought of it terrifies me. The current funding/employment infrastructure does not reward researchers who spend 4-years discovering their grant proposal was misplaced. Failure is defined by a lack of publications, not the practice of bad science.

When bad science can = improved career prospects, we might fairly say the system is broken.

I'm starting to wonder if having a hypothesis is too much of a career burden. ​Preconceived ideas fuel confirmation bias and hamper attempts to refute those ideas. A safer option may just be consistently ask interesting questions. 

An interesting question will always have an interesting answer — whichever way the data goes. ​​