Decoding the Dog Genome

A female Boxer provides the DNA for the first complete sequence of the dog genome—what will it mean to the health of man and dog?
By Mark Derr, November 2008, Updated June 2015

“The dog is everywhere what society makes him,” wrote Charles Dudley Warner in the January 1896 issue of Harper’s New Monthly Magazine. Elaine Ostrander and Heidi Parker update that message in the November 2005 issue of the online journal, Public Library of Science—Genetics: “The domestication of the dog from its wolf ancestors is perhaps the most complex genetic experiment in history, and certainly the most extensive.” Undeniably, the results of that experiment are directly manifest in the appearance and behavior of the dog.

Since hitching its evolutionary fate to that of humans some 9,000 canine generations ago, the dog has proven the most adaptable, versatile and steadfast of companions, serving as a guard; draft animal; hunter; herder; warrior; entertainer; finder of explosives, contraband, disease and lost souls; healer, therapist; physical and spiritual guide; and friend. With the public unveiling last month of the fully sequenced and richly annotated dog genome—the approximately 2.4 billion base pairs of DNA (A [adenine], which always binds with T [thymine], and C [cytosine], which binds with G [guanine]) that form its genetic code—the dog might now also add to its monikers, shall we say, “genomic consort.”

Of course, the dog is not the first mammal other than humans to have its genome fully sequenced—the mouse, rat and chimp got theirs first—but because of its architectonic breed structure, it might prove the most illuminating. To shift metaphors: geneticists can now use the dog genome sequence like a combination zoom and telephoto lens, zeroing in on specific genes and even minute changes within genes, or jetting back to examine broad patterns and interrelationships within it and between it and other genomes that reveal the evolutionary history of an individual, a breed, a population, or the entire species and genus.

Dog as Cultural Construct
A cultural and biological construct from the start, the dog is a mash of intensive human tinkering and the natural proclivities both of its wolfish fore bearer and of its randomly breeding dog ancestors. Indeed, the newly released analysis of the sequenced dog genome points to two unmistakable genetic bottlenecks—about 9,000 generations (taken by the sequencers as 27,000 years) ago, when perhaps as few as two tamed wolves produced the first litters of what became dogs. Since then, through what Darwin called conscious and unconscious selection, humans have cleaved the dog into breeds, in effect making it the most variable of mammals in terms of size and shape, with the exception of humans themselves. The most intensive period of breed formation occurred between 100 and 300 years ago, coincident with the rise of “scientific breeding” and clubs devoted to the cult of purebred dogs, primarily in Europe and North America.


Sign up for our newsletter and stay in the know.

Email Address:

By most estimates, there are today more than 400 breeds worldwide, many with specialized morphologies and behaviors, nearly all genetically isolated. The majority of those breeds are also susceptible to one or more of more than 400 genetic disorders, approximately 350 of which are also found in humans, including epilepsy, kidney cancer, deafness, blindness, auto-immune disorders, congenital heart disease, skeletal malformations, neurological abnormalities, bleeding disorders and neuropsychiatric disorders.

Because traits and diseases often cluster according to breed and because breeders maintain extensive pedigrees, canine geneticists have long argued that the dog represents an ideal natural model for examining how genes shape appearance, function, behavior and health. In 1991, Jasper Rine, a geneticist at the University of California, Berkeley, with two researchers in his lab—Mark Neff, a postdoctoral fellow, and Elaine Ostrander, a staff scientist—started the Dog Genome Project to study those issues and in the process create a map of the dog genome they could link to that of the human genome for comparative study. [Ed. Note: See interview with Mark Neff.] From that effort was born a cottage industry, under the informal leadership of Dr. Ostrander (now at the National Human Genome Research Institute of the National Institutes of Health) and involving a small group of scientists worldwide, devoted to sequencing the dog genome, segment by segment. The researchers also lobbied to have the dog genome sequenced as part of the continuing Human Genome Project, in order to complete the task quickly and accurately.

In 2003, scientists at the Institute for Genomic Research and what is now the J. Craig Venter Institute in Rockville, Md., published a proprietary sequence covering 75 percent of the genome of a Standard Poodle, Shadow. On the other hand, sequences prepared as part of the Human Genome Project are posted in public data banks in the US, Europe and Japan as soon as possible after they are completed, so researchers can have access to them.

In July 2004, without fanfare, researchers from the Broad Institute at MIT and Harvard and Agencourt Bioscience Corp., of Beverly, Mass., led by Kerstin Lindblad-Toh, deposited in those public data banks their “first draft” sequence of 98 percent of the genetic code for “dog” in general and Tasha, an inbred Boxer from upstate New York, in particular. Their sequence was more complete and considerably more detailed than that of Shadow.

Then came a pause—of the seemingly interminable sort that occurs between the time certain dogs are called while snorfling in the park and the time they decide to respond—that was devoted to revision, analysis and assigning chunks of the sequence to their appropriate chromosome. The dog has 38 pairs of autosomal chromosomes—inheriting one from each parent—and two sex chromosomes. Lindblad-Toh’s team did not sequence the Y (male) chromosome.

Finally, at a press conference in Boston on December 7, 2005; in a lengthy article in the prestigious journal Nature on December 8; and in supplemental articles in the December issue of Genome Research, Lindblad-Toh and her team, along with Ostrander and dog genome scientists, officially unveiled the by then 99 percent complete sequence of Tasha and a SNP (pronounced “snip”) map showing 2.5 million “single nucleotide polymorphisms,” or mutations, in the genomes of Tasha, nine other purebred dogs, four wolves and a coyote. This map is useful for finding genes and examining interrelations between groups and individuals. The Nature article also contained an analysis by Lindblad-Toh’s research team of the dog genome’s structure and a new look at the dog’s family tree, origins and transformation by humans. (Full disclosure: Bark deadlines being what they are, I did not attend the press conference, which was designed to receive maximum coverage in the daily media.)

The Dog Genome
Much of genomic science is still involved with characterization and description of the DNA sequence and parts therein, genes being only the most famous. It involves naming things previously perceived dimly, if at all, and often of unknown purpose. But without that basic work, the genome is basically unreadable.

From a certain perspective, the dog is just another mammal, albeit with a genome slightly smaller and “cleaner”—“there is less junk,” Lindblad-Toh said—than that of its human companion or the ubiquitous lab rat, to which the researchers also compared it in Nature. Tucked within the 2.4 billion base pairs of the dog’s DNA are some 19,300 genes. By comparison, the human genome consists of approximately 2.9 billion base pairs of DNA and, at most recent count, approximately 22,000 genes. Approximately 72 percent of the dog genes are orthologous, meaning they correspond on a one-to-one basis with genes found in the human and rat genomes, although their functions might differ.

Comparison of mouse, human and dog genomes have identified a core 812,000,000 base pairs (5.3% of the total human genome) of ancestral sequence common to all three species. This DNA encodes proteins (1-2% of the total genome), and includes specific sequences that control gene expression. This portion of the genome is under what biologists call purifying selection, wherein variations on a gene or changes in a sequence are selected against, or weeded out. The sequencing of additional mammalian genomes, including those of Rhesus monkey, cow, opossum, elephant, rabbit, cat and shrew, should help to sharpen the focus on the DNA definition of mammal-ness.

Despite the similarities between all three species, it appears the genome better reflects the social reality of dogs and humans than does taxonomy, which places the rat closer evolutionarily to humans than the dog. The researchers reported in Nature that some sets of functional genes, like those involved in brain development, showed signs of having evolved similarly in dogs and humans—and more rapidly than in rats. It is a suggestive finding.

The dog’s value in comparative genomics lies in large measure in its breed structure, and here the researchers offer some support to a couple of recent suggestions that repetitive segments of DNA are somehow tied to the dog’s physical plasticity, its ability to assume so many different shapes and sizes, as well as to fall victim to various inherited diseases. Geneticists have focused not only on SNPs—changes in a single base—but also on repetitive blocks of DNA, including “short interspersed nuclear elements” and “tandem repeats.”

SINE elements, as they are known, are repetitive segments of DNA between 150 to 750 bases long. Interspersed throughout a genome, they move around over time, and some are species-specific.

On the whole, the dog genome has fewer SINE elements than the rat or human. But it has a “highly active carnivore-specific SINE family” that is full of mutations that vary between breeds, Lindblad-Toh and her co-authors wrote in Nature. These SINE elements are greater in frequency by a factor of at least 10 than any found in humans, and are believed to play a role in gene expression. When inserted into genes, they can cause diseases, like narcolepsy in Doberman Pinschers and centronuclear myopathy (a muscle disease) in Labrador Retrievers.

Wei Wang and Ewen F. Kirkness of the Institute for Genomc Research, writing in Genome Research, argue that SINE elements are a major source of genetic diversity in the dog. Citing their research, Lindblad-Toh and her colleagues speculate in Nature that the variation from SINE elements “has provided important raw material for the selective breeding programs that have produced the wide phenotypic variations among modern breeds.” In that event, SINE elements may have been what has allowed humans to produce everything from the Pug to the Irish Wolfhound.

But no one knows. In December 2004, John W. Fondon, III, and Harold R. Garner of the University of Texas Southwestern Medical Center proposed in the Proceedings of the National Academy of Sciences—in a paper that caught the attention of scientists, if not the press—that changes in the length of “tandem repeats” found within genes are responsible for the phenotypic variation between breeds and the speed with which breeders can change a breed’s appearance. Once called “junk DNA,” like so many other parts of the sequence whose purpose was then unknown, these “tandem repeats” occur when two or more nucleotides form a pattern that repeats itself over a short stretch of the genome.

Simply, tandem repeats are shorter than SINE elements. Both are suspected of playing a role in creating the plethora of dog breeds. But changes in the timing of development are also believed to be involved. Sorting that out is where the Dog Genome Project began and where it still must go. In that sense, the genome sequence represents a beginning rather than an end.

A Genomic Look at History
It is often remarked and lamented that despite, in some cases, centuries of inbreeding and use of “favored sires” year after year, litter after litter, even the most purebred of dogs continue to show variability in terms of appearance and behavior. In other words, they don’t always breed “true” to the breeder’s desire. In behavioral terms, as John Paul Scott and John L. Fuller observed 40 years ago in their seminal book, Genetics and the Social Behavior of the Dog, through selective breeding, humans have concentrated different aspects of wolf behavior in dog breeds so that each one represents “one of many possible individual behavioral variations.” Yet Scott and Fuller also pointed out that for all the specialization, there is often greater variability in terms of temperament and talent between dogs within a breed than between breeds.

It is thus poetically fitting and perhaps scientifically significant that genetically, dogs show a similar pattern of homogeneity and variability between and within breeds, especially the modern breeds. In general, they were formed through extensive inbreeding and the use of favored sires, both of which serve to limit genetic diversity. Yet, despite that, or perhaps because the genetic isolation of breeds has not been long enough or as extensive as breeders sometimes claim, those breeds continue to possess a surprising amount of genetic diversity.
That diversity, in turn, makes it easier to find genes associated with various diseases and with physical appearance, as well as—it is thought—with specialized behavior. But to do that, a genetic sleuth, like a hurricane tracker, needs a map with proper coordinates, in this case SNPs.

It is easier to assemble the genomic sequence of highly inbreed animals because of the genetic homogeneity of the pairs of chromosomes being sequenced. Compared to other Boxers, Tasha’s genome has one SNP—one change in one letter, or nucleotide – every 1,600 base pairs. Less inbred breeds would have more SNPs. Such changes or mutations appear randomly throughout the genomes of all animals, primarily in non-coding regions outside genes, where their purpose, if any, is uncertain, and far less frequently within genes where they can cause lethal mutations.

SNPs persist for hundreds of generations and form distinctive, inheritable clusters or blocks of genetic code on chromosomes that are known as “haplotypes.” Because they are passed on through generations, haplotypes are useful for exploring the evolutionary history of individuals, groups and species, and seeking out clusters of genes involved in inherited diseases, in morphology and, it is hoped, behavior. Probing the differences between individuals with congenital heart disease, for example, and those without, researchers would use their SNP map to identify the haplotypes of sufferers against those who are disease-free in an effort to find a region or regions on a chromosome that seemed involved. There, they would focus the search for genes.

To create a densely detailed SNPs map of the dog genome, Dr. Lindblad-Toh’s team partially sequenced the genomes of nine additional dog breeds, four kinds of wolf, and a coyote: German Shepherd, Rottweiler, Bedlington Terrier, Beagle, Labrador Retriever, English Shepherd, Italian Greyhound, Alaskan Malamute, Portuguese Water Dog, Chinese gray wolf, Alaskan gray wolf, Indian gray wolf, Spanish gray wolf, and California coyote. They also had the genome sequence from the French Poodle, Shadow. The researchers found 2.5 million cases where there were differences in a single nucleotide between the various canine genomes.

Comparing the Boxer to the other breeds and the five wild canids, the researchers found that while the SNP rate between different Boxers was 1 for every 1,600 base pairs, it was around 1 for every 900 base pairs between the Boxer and every other breed but the Malamute, which was 1 for every 787 base pairs. According to the breed identification system developed by Elaine Ostrander and Leonid Kruglyak of the Fred Hutchinson Cancer Research Center in Seattle, the Malamute belongs to a class of “ancient dogs” and thus would be expected to be more distant from the modern Boxer than other modern breeds—the shorter the distance between SNPs, the more distant the relationship. The other dogs represented breeds created or consolidated within the past 300 years, like the Boxer. Compared with the Boxer, the wolves had ratios of around 1 SNP for every 580 base pairs, and the more distantly related coyote stood at 1 for every 420 base pairs.

The Boxer’s genome represents “a mosaic of long, alternating regions of near-total homozygosity and high heterozygosity,” the researchers reported in Nature. The homozygous regions, wherein both chromosomes in a pair have identical haplotypes, cover 62 percent of the genome; the heterozygous regions, in which the haplotypes are not identical due to SNPs or genetic variations, 38 percent.

The researchers then scanned the genomes of 20 dogs from each of 10 other breeds, and one dog from each of 24 breeds, ranging from the ubiquitous Labrador and Golden Retrievers to the rare Glen of Imaal Terrier, but always limiting themselves to purebred dogs registered by the American Kennel Club. The analysis by Lindblad-Toh’s team reported in Nature that most dog breeds were similar to the Boxer in terms of number of SNPs, and the relative proportions of heterozygosity and homozygosity were also similar.

The one aberration was the Akita, a Japanese breed created, by Lindblad-Toh’s estimate, some 10,000 years ago for hunting, which passed through a bottleneck in the 1940s in America. The first official Akita in America was a gift from Japan to Helen Keller in 1937.

Using mathematical models that postulated an effective population of 13,000 dogs with an inbreeding coefficient of .12—meaning basically that they are cousins—the researchers concluded that in achieving its current blend of sameness and difference, the dog passed through a major genetic bottleneck 9,000 generations ago, and another 30 to 90 (sometimes given as 50 to 100) generations ago. Assuming a generation time of three years for dogs, they pegged the origin of the dog at 27,000 years ago from perhaps as few as two wolves. Lindblad-Toh said in an e-mail that the founding population of wolves might have been larger—and some geneticists say there must have been several hundred animals involved—but the genome does not appear to record any contribution from them. She also said that there may have been multiple domestication events and back-crosses with wolves at various times and places, as other genetic studies have shown.

I am a fan of ancient dates when it comes to dog origins, and the older the better, but this new offering—a sort of compromise between 15,000 years ago in East Asia, proposed by Peter Savolainen of the Royal Institute of Biotechnology in Stockholm and his colleagues in 2002, and 40,000 to 135,000 years ago that Robert K. Wayne and his lab team at the University of California, Los Angeles (including Savolainen), proposed in 1997.

Geneticists are divided, but archaeologists are not. Darcy Franklin Morey, an archaeologist at the University of Kansas specializing in the dog, says that 27,000 years ago, like the older dates, fails to coincide with the archaeological record, which dates to around 12,000 to 14,000 years. Morey has a paper forthcoming in the Journal of Archaeological Science arguing that the proliferation of dog burials at the time marks the origin of the dog. He does have a point in that as a cultural construct, the dog has left natural history and entered human history. Arguably, then, the molecular clocks used to calibrate its age should be set to human, as well as geologic time. Beyond that, the choice of three years as the generation time for dogs is unexplained and possibly long.

A notable change in the dog from the wolf is that females enter first estrus at six months to one year of age. Breeders in the rural South have long bred their dogs at first heat, and it may well be a custom dating back centuries. Australian dingoes, living in packs independent of men, start breeding around two years of age, according to dingo expert Laurie Corbett in The Dingo in Australia and Asia. Beyond that, my searches fail to pull down any calculation for the generation time for dogs. Clearly, that needs more examination.

In a commentary accompanying Nature’s presentation of the genome, Hans Ellegren, an evolutionary biologist at Uppsala University, raised another qualifier, saying that if “repeated back-crossing has occurred” between dogs and wolves, Lindblad-Toh’s model of the dog’s origins “would have to be revised.”

The Broad Institute team’s analysis appears on firmer footing when it comes to placing breed formation within the past 200 to 300 years. Breeds are formed through consolidation of working dogs of an existing type into a more consistent form, through hybridization and inbreeding to reconstruct a breed, and through building on a small number of imported animals. Early dogs bred rather freely, and—because in sexual reproduction, the chromosomes from the sire and the dam are recombined to form the single chromosome passed on from each and because alignments are not perfect—haplotypes were broken up, becoming shorter and more scattered over the generations of random breeding.

Through inbreeding from a small gene pool to create their particular breed, humans unknowingly selected a small group of overlapping chromosomes carrying the genes for the traits they wanted—and for some diseases they didn’t want. From that came the breed’s distinctive pattern of large homogeneous haplotype blocks and shorter heterogeneous ones, which selective breeding has sustained.

A notable exception to this model appears to be the Labrador Retriever, which replaced the “old yellow dog,” or cur dog, as America’s most abundant big dog and so is less inbred than other breeds, except in some lines. The English Springer Spaniel and Golden Retriever are more inbred than Labs but less than most other breeds. But all three show the pronounced influence of “favorite sires,” whose overuse also serves to limit diversity. In an interview, Elaine Ostrander said research in her lab has indicated a clear genetic break between show Labs and field-trial Labs, and between show and hunting Beagles.

Breeds that passed through a formation bottleneck typically have four of around 10 possible haplotypes, the researchers said. Although the haplotypes and their proportions vary from breed to breed, haplotypes are also shared between them. As the Akita indicates, this arrangement might not extend beyond breeds of western dogs. But it means that researchers can use 10,000 SNPs, a relatively small number, to scan the genomes of dogs within the breed and compare them with those of other dogs in the same breed in order to find genes.

“We now have a whole new tool kit for looking at the evolutionary history of canines and the origins of dogs, where they originated and how they spread, and how often they interbred with local wolf populations,” said Robert K. Wayne, an evolutionary biologist at the University of California, Los Angeles. A 60,000-SNP GeneChip (similar to a computer chip but incorporating DNA) soon to be available from Affymetrix should speed the search for genes that regulate a dog’s phenotype, he added.

Wayne and his lab contributed a new family tree, or phylogeny, of 34 canid species for the Nature paper, showing the time of their emergence during the past 40 million years and solidifying the already strong argument that the wolf is the dog’s nearest relative, while the wolf’s wild kin are, in order, the coyote, golden jackal, Ethiopian wolf, dhole and African wild dog.

A small group of scientists believes that one of the dog’s more distant relatives, the fox, specifically a colony of tame foxes in Siberia, might hold the key to the genetic changes underlying domestication. They are using the dog genome, said one of the leaders in that quest, Gregory M. Acland, a geneticist with the James A. Baker Institute for Animal Health at Cornell. Like a number of other geneticists, Acland predicts that within 20 years, SNP maps will be rendered obsolete as chips and programs are developed that allow the entire genome of an individual, or parts thereof, to be sequenced cheaply and quickly.

He likens genomics to a sophisticated video game. “You start playing this game,” he said, “killing everything you see or collecting things, and after you’ve killed and collected everything there is and the game seems over, a little box appears in an upper corner. You click on it, and suddenly you’re in a whole new, more complicated level.”


Article first appeared in The Bark, Issue 34: Jan/Feb 2006