Pular para o conteúdo principal

Scientists rename human genes to stop Microsoft Excel from misreading them as dates

 Sometimes it’s easier to rewrite genetics than update Excel

There are tens of thousands of genes in the human genome: minuscule twists of DNA and RNA that combine to express all of the traits and characteristics that make each of us unique. Each gene is given a name and alphanumeric code, known as a symbol, which scientists use to coordinate research. But over the past year or so, some 27 human genes have been renamed, all because Microsoft Excel kept misreading their symbols as dates.

The problem isn’t as unexpected as it first sounds. Excel is a behemoth in the spreadsheet world and is regularly used by scientists to track their work and even conduct clinical trials. But its default settings were designed with more mundane applications in mind, so when a user inputs a gene’s alphanumeric symbol into a spreadsheet, like MARCH1 — short for “Membrane Associated Ring-CH-Type Finger 1” — Excel converts that into a date: 1-Mar.

This is extremely frustrating, even dangerous, corrupting data that scientists have to sort through by hand to restore. It’s also surprisingly widespread and affects even peer-reviewed scientific work. One study from 2016 examined genetic data shared alongside 3,597 published papers and found that roughly one-fifth had been affected by Excel errors.

“It’s really, really annoying,” Dezső Módos, a systems biologist at the Quadram Institute in the UK, told The Verge. Módos, whose job involves analyzing freshly sequenced genetic data, says Excel errors happen all the time, simply because the software is often the first thing to hand when scientists process numerical data. “It’s a widespread tool and if you are a bit computationally illiterate you will use it,” he says. “During my PhD studies I did as well!”

There’s no easy fix, either. Excel doesn’t offer the option to turn off this auto-formatting, and the only way to avoid it is to change the data type for individual columns. Even then, a scientist might fix their data but export it as a CSV file without saving the formatting. Or, another scientist might load the data without the correct formatting, changing gene symbols back into dates. The end result is that while knowledgeable Excel users can avoid this problem, it’s easy for mistakes to be introduced.

Help has arrived, though, in the form of the scientific body in charge of standardizing the names of genes, the HUGO Gene Nomenclature Committee, or HGNC. This week, the HGNC published new guidelines for gene naming, including for “symbols that affect data handling and retrieval.” From now on, they say, human genes and the proteins they expressed will be named with one eye on Excel’s auto-formatting. That means the symbol MARCH1 has now become MARCHF1, while SEPT1 has become SEPTIN1, and so on. A record of old symbols and names will be stored by HGNC to avoid confusion in the future.

So far, the names of some 27 genes have been changed like this over the past year, Elspeth Bruford, the coordinator of HGNC, tells The Verge, but the guidelines themselves weren’t formally announced until this week. “We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect,” says Bruford.

As Bruford makes clear, the art of naming genes is very much driven by consensus. Like the lexicographers charged with updating dictionaries, the Gene Nomenclature Committee has to be sensitive to the needs of those individuals who will be most affected by their work.

This wasn’t always the case, mind. In the early, frontier days of genetics, gene naming was often a playground for creative scientists, leading to notorious genes like “sonic hedgehog” (yes, named for that Sonic) and “Indy” (short for “I’m not dead yet”; a reference to the gene’s function, which can double the life span of fruit flies when mutated).

Now, though, the HGNC has taken matters firmly in hand, and current guidelines don’t cede much ground to whimsy or ego. The focus is on practical concerns: how do we minimize confusion? For that reason, gene symbols should be unique, and gene names should be brief and specific, says the committee. They cannot use subscript or superscript; can only contain Latin letters and Arabic numerals; and should not spell out names or words, particularly offensive ones (a rule that should hold true “ideally in any language”).

And while the decision to rename genes is not taken lightly, it’s not unusual, says Bruford. Many gene symbols that can be read as nouns have been renamed to avoid false positives during searches, for example. In the past, CARS has become CARS1, WARS changed to WARS1, and MARS tweaked to MARS1. Other changes have been made to avoid insult.

“We always have to imagine a clinician having to explain to a parent that their child has a mutation in a particular gene,” says Bruford. “For example, HECA used to have the gene name ‘headcase homolog (Drosophila),’ named after the equivalent gene in fruit fly, but we changed it to ‘hdc homolog, cell cycle regulator’ to avoid potential offense.”

But Bruford says this is the first time that the guidelines have been rewritten specifically to counter the problems caused by software. So far, the reactions seem to be extremely positive — some would even say joyous.

After geneticist Janna Hutz shared the relevant section of HGNC’s new guidelines on Twitter, the response from the community was jubilant. “THRILLED by this announcement by the Human Gene Nomenclature Committee,” tweeted Hutz herself. “Finally!!!” responded Mudra Hegde, a computational biologist at the Broad Institute in Massachusetts. “Greatest news of the day!” said a pseudonymous Twitter user.

Bruford notes that there has been some dissent about the decision, but it mostly seems to be focused on a single question: why was it easier to rename human genes than it was to change how Excel works? Why, exactly, in a fight between Microsoft and the entire genetics community, was it the scientists who had to back down?

Microsoft did not respond to a request for comment, but Bruford’s theory is that it’s simply not worth the trouble to change. “This is quite a limited use case of the Excel software,” she says. “There is very little incentive for Microsoft to make a significant change to features that are used extremely widely by the rest of the massive community of Excel users.”

Bruford doesn’t seem bitter about the situation, though. After all, she says, it wouldn’t do to wait on a hypothetical Excel update to fix these problems when a long-term solution can be introduced by scientists themselves. Microsoft Excel may be fleeting, but human genes will be around for as long as we are. It’s best to give them names that work.

Correction: The story has been corrected to clarify that Excel users can save spreadsheets that retain their formatting, avoiding the mistake where gene symbols are changed into dates. We regret the error.

SOURCE: The Verge's Website

Posted by Cláudio H. Dahne

Comentários

Postagens mais visitadas deste blog

CONSERVAÇÃO DE ALIMENTOS E A EQUAÇÃO DE ARRHENIUS por Carlos Bravo Diaz, Universidade de Vigo, Espanha

Traduzido por Natanael F. França Rocha, Florianópolis, Brasil  A conservação de alimentos sempre foi uma das principais preocupações do ser humano. Conhecemos, já há bastante tempo, formas de armazenar cereais e também a utilização de azeite para evitar o contato do alimento com o oxigênio do ar e minimizar sua oxidação. Neste blog, podemos encontrar diversos ensaios sobre os métodos tradicionais de conservação de alimentos. Com o passar do tempo, os alimentos sofrem alterações que resultam em variações em diferentes parâmetros que vão definir sua "qualidade". Por exemplo, podem sofrer reações químicas (oxidação lipídica, Maillard, etc.) e bioquímicas (escurecimento enzimático, lipólise, etc.), microbianas (que podem ser úteis, por exemplo a fermentação, ou indesejáveis caso haja crescimento de agentes patogênicos) e por alterações físicas (coalescência, agregação, etc.). Vamos observar agora a tabela abaixo sobre a conservação de alimentos. Por que usamo...

Two new proteins connected to plant development discovered by scientists

The discovery in the model plant Arabidopsis of two new proteins, RICE1 and RICE2, could lead to better ways to regulate plant structure and the ability to resist crop stresses such as drought, and ultimately to improve agricultural productivity, according to researchers at Texas A&M AgriLife Research. Credit: Graphic courtesy of Dr. Xiuren Zhang, Texas A&M AgriLife Research The discovery of two new proteins could lead to better ways to regulate plant structure and the ability to resist crop stresses such as drought, thus improving agriculture productivity, according to researchers at Texas A&M AgriLife Research. The two proteins, named RICE1 and RICE2, are described in the May issue of the journal eLife, based on the work of Dr. Xiuren Zhang, AgriLife Research biochemist in College Station. Zhang explained that DNA contains all the information needed to build a body, and molecules of RNA take that how-to information to the sites in the cell where they can be used...

Fármaco brasileiro aprovado nos Estados Unidos

  Em fotomicrografia, um macho de Schistosoma mansoni, causador da esquistossomose CDC/G. Healy A agência que regula a produção de alimentos e medicamentos dos Estados Unidos, a FDA, concedeu o status de orphan drug para o fármaco imunomodulador P-Mapa, desenvolvido pela rede de pesquisa Farmabrasilis, para uso no tratamento de esquistossomose.  A concessão desse status é uma forma de o governo norte-americano incentivar o desenvolvimento de medicamentos para doenças com mercado restrito, com uma prevalência de até 200 mil pessoas nos Estados Unidos, embora em outros países possa ser maior. Globalmente, a esquistossomose é uma das principais doenças negligenciadas, que atinge cerca de 200 milhões de pessoas no mundo e cerca de 7 milhões no Brasil.  Entre outros benefícios, o status de orphan drug confere facilidades para a realização de ensaios clínicos, após os quais, se bem-sucedidos, o fármaco poderá ser registrado e distribuído nos Estados Unidos, no Brasil e em outro...