Frontiers | Tissue-Based Mapping of the

Frontiers | Tissue-Based Mapping of the Fathead Minnow (Pimephales promelas) Transcriptome and Proteome
ORIGINAL RESEARCH article
Front. Endocrinol.
, 06 November 2018
Sec. Systems Endocrinology
Volume 9 - 2018 |
Published in
Frontiers in Endocrinology
Systems Endocrinology
4.6
impact factor
7.4
citescore
Edited by
Tomer Ventura
University of the Sunshine Coast, Australia
Reviewed by
Shannon William Davis
University of South Carolina, United States
Matthew Brook
University of Edinburgh, United Kingdom
Outline
Figures and Tables
Figure 1
View in article
Figure 2
View in article
Figure 3
View in article
Figure 4
View in article
Figure 5
View in article
Figure 6
View in article
Figure 7
View in article
Figure 8
View in article
Figure 9
View in article
Figure 10
View in article
Table 1
View in article
Table 2
View in article
Table 3
View in article
Table 4
View in article
Table 5
View in article
ORIGINAL RESEARCH article
Front. Endocrinol.
, 06 November 2018
Sec. Systems Endocrinology
Volume 9 - 2018 |
Tissue-Based Mapping of the Fathead Minnow (
Pimephales promelas
) Transcriptome and Proteome
Candice Lavelle
1,2
Ley Cody Smith
2,3
Joseph H. Bisesi Jr.
1,2
Fahong Yu
Cecilia Silva-Sanchez
2,4
David Moraga-Amador
Amanda N. Buerger
1,2
Natàlia Garcia-Reyero
Tara Sabo-Attwood
1,2
Nancy D. Denslow
2,3
1.
Department of Environmental and Global Health, University of Florida, Gainesville, FL, United States
2.
Center for Environmental and Human Toxicology, University of Florida, Gainesville, FL, United States
3.
Department of Physiological Sciences, University of Florida, Gainesville, FL, United States
4.
Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, United States
5.
Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, MS, United States
Article metrics
View details
Abstract
Omics approaches are broadly used to explore endocrine and toxicity-related pathways and functions. Nevertheless, there is still a significant gap in knowledge in terms of understanding the endocrine system and its numerous connections and intricate feedback loops, especially in non-model organisms. The fathead minnow (
Pimephales promelas
) is a widely used small fish model for aquatic toxicology and regulatory testing, particularly in North America. A draft genome has been published, but the amount of available genomic or transcriptomic information is still far behind that of other more broadly studied species, such as the zebrafish. Here, we used a proteogenomics approach to survey the tissue-specific proteome and transcriptome profiles in adult male fathead minnow. To do so, we generated a draft transcriptome using short and long sequencing reads from liver, testis, brain, heart, gill, head kidney, trunk kidney, and gastrointestinal tract. We identified 30,378 different putative transcripts overall, with the assembled contigs ranging in size from 264 to over 9,720 nts. Over 17,000 transcripts were >1,000 nts, suggesting a robust transcriptome that can be used to interpret RNA sequencing data in the future. We also performed RNA sequencing and proteomics analysis on four tissues, including the telencephalon, hypothalamus, liver, and gastrointestinal tract of male fish. Transcripts ranged from 0 to 600,000 copies per gene and a large portion were expressed in a tissue-specific manner. Specifically, the telencephalon and hypothalamus shared the most expressed genes, while the gastrointestinal tract and the liver were quite distinct. Using protein profiling techniques, we identified a total of 4,045 proteins in the four tissues investigated, and their tissue-specific expression pattern correlated with the transcripts at the pathway level. Similarly to the findings with the transcriptomic data, the hypothalamus and telencephalon had the highest degree of similarity in the proteins detected. The main purpose of this analysis was to generate tissue-specific omics data in order to support future aquatic ecotoxicogenomic and endocrine-related studies as well as to improve our understanding of the fathead minnow as an ecological model.
Introduction
Omics technologies have significantly improved our understanding of how biological systems work. Their rapid development and the large amount of data generated allowed for the evolution of top-down approaches in order to understand systems that would complement the reductionist bottom-up approaches. These developments enabled rapid and broad characterization of many levels of biology through genome and transcriptome sequencing, proteomics, or metabolomics (
). Due to the extremely rapid advancement of sequencing technologies, it is now faster and more affordable than ever to generate data for genomics and transcriptomics analyses. As a result, omics techniques are increasingly being applied to “unusual” species to generate information that allows better understanding of novel biological characteristics (
) in fields ranging from evolution and adaptation to toxicology and endocrine research (
). A key step in the development of omics applications for endocrine research is to refine their utilization in model species used in understanding both the highly conserved and the species-specific aspects of the endocrine system (
). Here, we aim to further increase our knowledge of the fathead minnow to improve its usefulness as an ecological and endocrine model.
The fathead minnow (FHM,
Pimephales promelas
) is a member of the Cyprinidae family with a broad distribution in aquatic environments, both in running and still waters, across North America (
). They tolerate a wide range of water characteristics, including pH, alkalinity, and temperature (
11
). Fathead minnows are sexually dimorphic and have a rapid life cycle, with a well-defined developmental process, reproductive cycle, and behavior (
12
15
). All of these characteristics together with the well-established methods for its culture and husbandry (
16
) make the FHM suitable as an ecologically relevant fish model. In fact, the FHM is the most frequently used small fish model for regulatory ecotoxicology in North America since the 1950s (
17
). After the US Environmental Protection Agency was established in 1970, the FHM was adopted as a primary model organism for standardized regulatory toxicity testing, leading to the development of numerous testing guidelines (
18
20
). As a consequence, the extensive toxicity data available offers the FHM the greatest potential for linking molecular diagnostic indicators to ecologically relevant outcomes (
17
).
The relatively recent interest in contaminants that act as endocrine disruptors has focused on effects on the endocrine system of fish, since these organisms are present in contaminated environments. Studies analyzing effects on reproduction (
21
23
), thyroid function (
24
25
), neuroendocrine control (
26
28
) or effects on sex differentiation during sensitive periods of development (
29
32
) require good molecular tools for data interpretation. Thus it is important to develop well-annoted sequence databases to have a more comprehensive evaluation of the effects of endocrine disruptors on fathead minnows using functional genomic approaches. In addition, it is important to understand the physiology and endocrinology of this useful species. However, significantly less genetic information is available for the FHM than other models such as the zebrafish (
Danio rerio
), which has an assembled reference genome (
).
The first FHM draft genome was published in 2016 (
33
) and was produced from Illumina sequencing at 120X coverage. The genome annotation was later improved, leading to a total of 43,345 gene predictions (
34
). In addition, a web-accessible genome browser was created, which enables simplified access to the sequence data and its associated annotations (
). Nonetheless, it is crucial to continue increasing our basic understanding of the FHM model by expanding on genome annotation studies, including characterizing both the transcriptome and proteome. This will further facilitate its use in a broad range of applications: from endocrine-related studies, to predictive toxicology and development of computational models, and its use as a surrogate to study other species, including those that are threatened and endangered.
The main objective of this study was to increase the value of the FHM as a model by creating comprehensive transcriptomic and proteomic databases. This study also aims to survey tissue-specific baseline transcriptomic and proteomic expression profiles in select endocrine active organs in adult male FHM to support aquatic ecotoxicogenomic studies.
Materials and methods
Fish rearing
All fish husbandry was conducted under the supervision of the University of Florida Institutional Animal Care and Use Committee. Adult fathead minnows (
Pimephales promelas
) were obtained from an in-house culture at the Aquatic Toxicology Core Laboratory at the University. Fish were maintained in the laboratory in flow-through systems of dechlorinated tap water prior to selection for sequencing experiments.
Fish were sacrificed at different times for three different experiments by submersion in buffered 250 mg/L MS-222 (Western Chemical). Fish tissues were harvested for each experiment and flash-frozen in liquid nitrogen and stored at −80°C until needed. For the PacBio experiment, tissues were harvested from a single male fish, including the whole brain, gut, liver, gonad, heart, gill, head kidney, and trunk kidney. For the RNA-seq experiment, three individual male fish were used, and tissues collected included the telencephalon, hypothalamus, liver, and gut, and the same 4 tissues were collected from two male fish for the proteomics experiment.
RNA extraction and sequencing
Tissue extractions followed procedures previously described (
35
36
). Briefly, tissues were homogenized in RNA Stat-60 (TelTest) using a handheld rotary homogenizer followed by organic separation with chloroform. RNA was then subjected to a second round of RNA Stat-60/Chloroform extraction, followed by precipitation in isopropanol overnight at −20°C. RNA was washed twice with 75% ethanol, dried, and reconstituted in RNAsecure (ThermoFisher Scientific). Reconstituted RNA was DNase-treated to remove possible genomic DNA contamination using Turbo DNase (ThermoFisher Scientific). The quality of the RNA was assessed using an Agilent Bioanalyzer 2100. Only samples with RNA integrity numbers (RINs) exceeding 8 were used for sequencing. Samples were then quantified using a ThermoFisher Scientific Qubit 3.0 fluorimeter.
For the PacBio sequencing, an RNA pool was created by adding equal mass of RNA from each of the extracted tissues (brain, liver, gut, testes, heart, gill, head kidney, and trunk kidney) into the pool. Pools were delivered to the Interdisciplinary Center for Biotechnology Research (ICBR) Sequencing Core Laboratory. For the RNA-seq experiments telencephalon, hypothalamus, liver, and gut tissues from three different fish were kept separate for downstream analysis.
For RNAseq, library preparation and sequencing were performed by Global Biologics LLC (Columbia, MO, USA). Total RNA was quantitated using a Qubit RNA assay kit and Qubit 2.0 fluorometer (Life Technologies Inc.), and RNA integrity was confirmed using the standard sensitivity Fragment Analyzer Total RNA Assay and System (Advanced Analytical Inc.). Briefly, five hundred nanograms of total RNA was used as input material for the Illumina TruSeq Directional v2 high-throughput library construction procedure (Illumina Inc.). Messenger RNA was enriched from total RNA using oligo-dT magnetic beads and fragmented to ~100–300 bp with a single shearing and RT primer hybridization step before generating first- and second-strand cDNA. The resulting DNA was prepared for sequencing by blunt end repair, 3′ adenylation, multiplex compatible adapter ligation (containing TruSeq indexes), and PCR amplification (98°C for 30 s, 11–13 cycles [98°C for 10 s, 60°C for 30 s, and 72°C for 30 s], 72°C for 5 min, and 10°C hold). Library validation was performed using the Fragment Analyzer (Advanced Analytical Inc.) followed by quantitation using the Qubit HS DNA Assay and qPCR Kit for Illumina (Kapa Biosystems Inc). Libraries were diluted based on the quantitation obtained using the Qubit fluorometer and sequenced using one lane (paired-read 100 bp sequencing) on the HiSeq 4000 platform (Illumina Inc.).
Long read sequencing for transcriptome construction
Long read sequencing was performed using the Pacific Biosystems RSII long read sequencer. Full-length, RNA sequencing libraries (i.e., Iso-Seq
TM
) were constructed according to the recommended protocol by PacBio (
37
38
), with a few modifications. Briefly, only RNA preparations with a RIN ≥ 8 were used, as indicated by the Agilent BioAnalyzer or TapeStation. RNA preparations of similar quality from brain, liver, gut, testes, heart, gill, head kidney, and trunk kidney from one male fathead minnow were pooled and used for IsoSeq as a single sample. Briefly, one microgram of total RNA from the pool described above was converted to full-length cDNA using the SMRTer PCR cDNA synthesis reagents (Cat. # 634925) (Clontech, Palo Alto, CA). The number of cDNA amplification cycles was optimized to generate sufficient material that could be used for PacBio SMRT bell library construction over four fraction sizes (0.8–2 kb, 2–3 kb, 3–5 kb, and >5 kb). Fourteen amplification cycles were required. Full-length total cDNA was placed on the ELF SageSciences system (Electrophoretic Lateral Fractionation System). Twelve cDNA fractions were collected, of varying size between 0.8 and ~15 kb. Further amplification was needed to generate enough material (for library construction) for the two larger size bins. Additional amplification of the larger size bins resulted in small size byproducts. Therefore, a second size selection (for 3–5 and >5 kb fragments) was performed using an 11 cm x 14 cm agarose slab gel. Library-polymerase binding was done at 0.01–0.04 nM (depending on library insert size) for sequencing on the PacBio RSII instrument. Diffusion loading was used for the short fragments, while MagBead loading was used for the larger fragments.
Sample cleaning of SageELF fractions and SMRT bell library construction was done following the manufacturer's protocols (
39
). In brief, fractions were purified using AMPure magnetic beads (0.6:1.0 beads to sample ratio). Final libraries were eluted in 15 μL of 10 mM Tris HCl, pH 8.0. Library fragment size was estimated by the Agilent TapeStation (genomic DNA tapes), and this data was used for calculating molar concentrations. Between 75 and 125 pM of library from each size fraction was loaded onto eight SMRT cells for PacBio RS II sequencing. All other sequencing steps were done according to the recommended protocol by the PacBio sequencing calculator and the
RS Remote Online Help
system.
Bioinformatics
De novo
assembly
The raw reads generated from multiple insert-size libraries by PacBio RSII sequencer were processed with PacBio SMRT portal system. The ROI (reads of inserts) from subreads, including the full-length non-chimeric reads, were produced by RS_IsoSeq (
40
). The iterative clustering for error correction (ICE) algorithm and Quiver were applied for improving isoform accuracy and removing redundancy (Table
). All isoform sequences were further clustered and assembled with PTA version 3.0.0 (Paracel Transcript Assembler) (Paracel Inc, Pasadena, CA).
Table 1
Libraries
SMRT cells
ROI
Full length of ROI
Mean length of ROI
Mean quality of ROI
Mean passes
0.8–2kb
96194
61014
1133
0.95
17
2–3kb
92862
37347
1736
0.89
6.6
3–5kb
104117
16308
2216
0.86
2.5
>5kb
71778
14094
2997
0.88
4.5
Total
364951
128763
2020.5
0.895
PACBio sequencing data.
Raw sequencing data generated from illumina NextSeq 500 system were processed with the program Cutadapt (
41
) to trim off sequencing adaptors, primers, and low-quality bases with respect to a quality value cutoff of 20 (phred-like score). With masking and trimming sequencing repeats, primers and/or adaptors used in cDNA library preparation and normalization, the resulting reads with > = 40 bp were assembled using Trinity (
42
), SOAPdenovo (
43
), and Newbler assembler (version 2.8). A hybridized transcriptome assembly of the contigs with ≥ 75 bp from Trinity, SOAPdenovo, and Newbler was performed with PTA version 3.0.0 (Paracel Transcript Assembler) (Paracel Inc, Pasadena, CA). In PTA, the low-quality bases were trimmed and the sequences with length <75 bp and the mitochondrial and ribosomal RNA genes of FHM were excluded from consideration during initial pair-wise comparison. After cleanup, sequences were passed to the PTA clustering module for pair-wise comparison and then to CAP3-based PTA assembly module for assembly.
The consensus sequences resulting from the PTA were annotated against the NCBI NR and NT databases. For each query sequence, the top 100 blast hits were retrieved and the best scoring hit and the tentative GO term from Gene Ontology with e-value ≤ 1e-4 were annotated to query sequences. These GO term assignments were organized around GO hierarchies divided into biological processes, cellular components, and molecular functions. In addition, we also characterized the assembled sequences with respect to functionally annotated genes by BLAST searching against the NCBI reference sequences (RefSeq) of
Danio rerio
(46,757 transcripts).
Analysis of RNA-seq data
Reads acquired from the illumina HiSeq 4000 sequencing platform were cleaned up with the Cutadapt program to trim off sequencing adaptors and low-quality bases with a quality phred-like score < 20. Reads < 40 bases were excluded from RNA-seq analysis. The transcriptome consensus sequences were used as reference sequences for RNA-seq analysis. The cleaned reads of each sample were mapped independently to the
Danio rerio
reference sequences using the mapper of bowtie 2 with a maximum of 3 mismatches for each read. The mapping results were processed with samtools and scripts developed in house at ICBR to remove potential PCR duplicates and choose uniquely mapped reads for gene expression analysis.
Differential gene expression was determined as follows: The number of mapped reads for each individual gene was counted using scripts developed in house at ICBR and analyzed by the DEB application for all pairwise comparisons using the edgeR algorithm and a 5% FDR cutoff (
44
). Significant up- and down-regulated genes were selected using the FDR adjusted
-value, fold-change, or both for downstream analysis.
Confirmation of RNAseq transcripts with quantitative PCR
To cofirm the expression of select transcripts from the RNAseq data set, five healthy male fathead minnows were obtained from culture at the Center for Environmental and Human Toxicology, euthanized, and hypothalamus, telencephalon, liver and gut tissues were collected for RNA extraction and analysis. RNA extraction followed the same procedures described above for RNAseq. Primers were designed and validated for the following transcripts: lipoprotein lipase (lpl), estrogen receptor βb (
er
), peptide transporter 1 (
pept1
), and cytochrome P450 19a1b (
cyp19a1b
). Primer Sequences and conditions are found in Supplementary Table
. Isolated RNA was reverse transcribed into cDNA (Quanta cDNA synthesis kit), and mixed with forward and reverse primers and SYBR Green for amplification and measurement on the BioRAD CFX96 Real-Time PCR Detection System using the following cycling parameters: 95°C for 3 min followed by 40 cycles of 95°C for 10 s, 58–60°C for 30 s (see Supplementary Table
for gene specific annealing temperatures). Replicate gene expression Cq values were normalized to the average Cq value for the hypothalamus for each gene, and presented as average fold change ± standard deviation in each tissue compared to the hypothalamus.
Protein extraction and digestion
Tissue samples were mechanically disrupted in 300 μL RIPA buffer (25 mM Tris–HCl, pH 7.6, 150 mM NaCl, 1% nonylphenoxylpolyethoxylethanol-40, 1% sodium deoxycholate and 0.1% SDS) (Thermo) containing a protease inhibitor tablet (proprietary formulation containing AEBSF HCl, aprotinin, bestatin, E-64, leupeptin, pepstatin, EDTA) (Pierce) and subsequently incubated on ice for 30 min with intermittent vortexing. Samples were spun at 10,000 x g for 20 min at 4°C and supernatants were removed and protein content quantified by Bradford Protein Assay (Biorad). To 100 μL of supernatant, 400 μL of methanol was added followed by vigorous vortexing. Chloroform was added at 1:4 v/v methanol and samples were vigorously vortexed. Thereafter, 300 μL ddH
O was added to the samples and vigorously vortexed. Samples were then spun at 14,000 x g for 2 min at room temperature, the top aqueous layer was removed, and 400 μL methanol was added followed by vigorous vortexing. Samples were spun at 14,000 x g for 3 min and methanol was removed. Samples were dried and resuspended in 100 μL RIPA buffer containing protease inhibitor tablets.
Total protein (100 μg) from each sample was acetone-precipitated. The samples were dissolved in 0.1% SDS, 0.5 M triethylammonium bicarbonate (TEAB), pH 8.5; then reduced, alkylated, trypsin- (Promega, USA) digested and labeled according to manufacturer's instructions (ABsciex Inc. USA). Extra labels were quenched by adding 100 μL of ultrapure water and left at room temperature for 30 min. After quenching, samples were mixed together and dried down in a speedvac. The peptide mixtures were cleaned up with C18 spin columns according to manufacturer's instructions (Supelco, USA). Sample labeling was as follows; gut tissue biological replicates (113 and 118), hypothalamus biological replicates (114 and 117), telencephalon biological replicates (115 and 119), and liver biological replicates (116 and 121). The samples were then dissolved in strong cation exchange (SCX) solvent (25% v/v ACN, 10 mM ammonium formate, pH 2.8) and injected onto a Agilent HPLC 1100 system using a polysulfoethyl A column (2.1 mm x 100 mm, 5 μm, 300 Å, PolyLC, Columbia, USA). The peptides were eluted at a flow rate of 200 μL/min with a linear gradient from 0 to 20% solvent B (25% v/v ACN, 500 mM ammonium formate) over 80 min, followed by a ramping up to 100% solvent B in 5 min and holding for 10 min. The peptides were detected at 214 nm absorbance and a total of 10 fractions were collected.
Mass spectrometry
Each SCX fraction was lyophilized in a speedvac and resuspended in loading buffer (3% acetonitrile, 0.1% acetic acid, 0.01% TFA) and cleaned up with C18 ZipTips according to manufacturer's instructions (Ziptip Millipore). After C18 solid phase extraction, samples were resuspended in loading buffer and 10 μL was injected onto an Acclaim Pepmap 100 precolumn (20 mm x 75 μm; 3 μm-C18) and then separated on a PepMap RSLC analytical column (250 mm x 75 μm; 2 μm-C18) at a flow rate of 350 nL/min on a 1200 nano Easy LC (Thermo Fisher). Solvent A composition was 0.1% formic acid (v/v); whereas solvent B was 99.9% ACN v/v, 0.1% formic acid (v/v). Peptide separation was performed with a linear gradient from 2 to 24% solvent B for 95 min, followed by an increase to 98% solvent B over 15 min and final hold for 10 min. Eluted peptides were directly sprayed onto an Q Exactive Plus hybrid quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific) for MS/MS analysis. The instrument was run on a data-dependent mode with a full MS scan 400–2,000 m/z and resolution of 70,000. MS/MS experiments were performed for the top 10 most intense ions using the following settings: an HCD NCE = 28%, isolation width = 3 Th, first mass = 105 Th, 5% underfill ratio, peptide match set to “preferred,” and an AGC target of 1e6. Dynamic exclusion for 60 s was used to prevent repeated analysis of the same peptides. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (
45
) partner repository with the dataset identifier PXD010216. An excel spreadsheet containing mass spectra information for identifying the proteins is found in
supplementary information
Database searching and protein identification
A custom database was constructed for searching protein identification. This database was a composite of an in-house FHM protein database and the zebrafish (
Danio rerio)
database on uniprot. The in-house FHM database was created by selecting the longest open reading frame from the 6-frame translation of each sequence in our transcriptome database consisting of the PacBio reads generated in this study and reads from previous sequencing data from our labs in Blast2Go with the ORF Predictor function. The software chose the longest open reading frame for each sequence, which was subsequently annotated against zebrafish NR database using blastx and blastp and resulted in 56,099 annotated sequences. Once combined with the Uniprot zebrafish protein database our composite database consisted of 117,445 sequences.
The identification and quantification of proteins were performed using ProteinPilot™ Software 5.0.1 (AB SCIEX, Concord, ON) utilizing the Paragon and Progroup algorithms. The previously described protein database was appended before use to include common lab contaminants, and then the entire search field was doubled by the inclusion of decoys for calculating the FDR by the target-decoy method. The search parameters were as follows: iTRAQ 8-plex (peptide labeled), MMTS as a fixed modification on cysteine, trypsin digestion, orbi MS (1-3 ppm), Orbi MS/MS, no special factors, and ID focus of biological modifications and amino acid substitutions. The Unused ProtScore (Conf) was set at > 0.05 (10.0%) and
-value < 0.05 to ensure that quantitation was based on at least three unique peptides.
Additionally, because iTRAQ is a relative quantitation method, all data are reported as ratios of expression against another tissue, we chose hypothalamus. Our samples were expected to have a high percentage of differentially expressed proteins because they originate from different tissues; therefore, no bias or background corrections were applied. For a protein to be used for quantitative analysis and downstream pathway analysis it had to meet a series of conditions: it had to be identified at a 1% global FDR and ratio calculation
-value of < 0.05. Quantified proteins with a
-value >0.05 were not supported with enough evidence to reject the null hypothesis that differences observed in iTRAQ label ratios were random. For each replicate, the ratio to both normalizing hypothalamus replicates was averaged in log space. Then both replicates for each tissue were averaged in log space to calculate the overall tissue ratio.
Pathway analysis
Subnetwork enrichment analysis (SNEA) was conducted in PathwayStudio
TM
10 (Elsevier) operating on the ResNet 11.0 mammalian database using the Fisher's Exact Test Subnetwork Enrichment Analysis option limiting subnetworks to those with
< 0.05.
Results and discussion
The FHM is the model of choice for ecotoxicology in North America as there are many studies relating toxicant exposures to changes in apical endpoints in these fish [for a review, please see (
17
)]. In the present study, we chose one male FHM for single DNA molecule sequencing using the PacBio instrument in order to generate long reads. The transcriptome for FHM was assembled and it was used as a scaffold for interpreting RNA-seq and proteomics data to determine tissue-specific transcripts. The schematic in Figure
describes the overall experimental approach.
Figure 1
Generation of FHM transcriptome
To generate a good quality transcriptome for FHM, we utilized the PacBio instrument, which provides single molecule, full-length transcript sequencing. This instrument can sequence very long reads (up to 100 kb) directly from a single DNA molecule (
46
). This technology sequences DNA from a closed circle using a template called the SMRTbell, which can diffuse into a nano-well called the zero-mode waveguide [for more information about the technology, please see (
47
)]. The circles can be very large and encompass an entire mRNA. This is the ideal instrument to assemble a transcriptome and aid the assembly of a reference genome. One disadvantage that has been pointed out by several studies is its relatively high error rate, about 11–15%, on any read. However, it is possible to work around this error rate as the errors are distributed randomly and the machine can read around the circle multiple times. It has been estimated that a 99% sequence fidelity can be determined by lining up the multiple sequences. PacBio reads are typically longer than the full-length cDNA sequence, allowing each molecule to go through several passes of sequencing. This routinely works, as the read length is up to 100 kb (
47
).
We obtained 30,385 reads from PacBio sequencing, covering a large portion of the transcriptome for a single male. The read lengths ranged from 264 to over 9,720 nts. We binned the sequences into groups based on their lengths with 250 nts per group, giving us 40 different groups (Figure
2A
). We had 17,382 transcripts that were ≥1,000 nts and 182 that were ≥5,000 nts. At the high end of the distribution the five longest transcripts ranged from 7,726 to 9,720 nts long. In addition to transcripts identified by PacBio sequencing, we added sequences that we obtained from several Illumina RNA-seq projects for a large group of fathead minnows. This addition greatly increased the coverage of shorter contig lengths and enhanced some of the longer sequences giving us 21,183 transcripts >1,000 nts and 308 transcripts >5,000 nts (Figure
2B
).
Figure 2
In preparing libraries of cDNA for sequencing by the PacBio instrument, it is possible to use barcodes to identify sequences from different tissues. However, in the present investigation, due to cost, we decided to pool RNAs from a variety of tissues and used a strategy that would ensure some long reads. Also, we wanted to enhance sequences that may lead to the identification of splice variants, as the PacBio is the ideal Next Gen sequencer for this purpose (
48
). For this work, we used a single adult male FHM, to prevent confounding by single polymorphic sequences from a population of fish (Manuscript in preparation).
Tissue-specific transcriptome information for FHM
We performed RNA-seq on hypothalamus, telencephalon, liver and gut of three different adult male FHMs to evaluate tissue-specific expression of genes. For a review of RNA-seq methodologies, please see Bayega et al. (
49
). As expected, each of the tissues, composed of different cell types, showed specific expression fingerprints. Overall, the RNA aligned to 30,378 different putative transcripts in our database. Transcript copies ranged from 0 to 600,000 copies. The mean number of copies of mRNAs in our sampling per tissue ranged from 80 to 266 when sequences with >50 hits were excluded. This is an arbitrary cut off, as some genes with important cellular functions may be expressed with lower copy number, but we think it is a reasonable cut off as estrogen receptor 2b (esr2b) ranged from 243 counts in the telencephalon to 2,517 counts in the liver, values similar to those published by Filby and Tyler using real time qPCR in adult male fathead minnows (
50
) Similarly, esr2a ranged from 35 in the telencephalon to 195 in the gut, relative values again similar to published data. Additionally, there were very low number of hits in males for esr1. Published data indicates that esr1 should be high in the liver of males and not found in the other tissues (
50
) and while we also found that to be the case in our study, the number of hits were well below our cutoff of 50 hits per gene. Pairwise comparisons were made for each tissue for all transcripts that were measured in at least 2 replicates of at least 1 tissue (Supplementary Figure
). Overall, 28,616 transcripts met the requirements for statistical testing in DEB. Of those, 12,610 transcripts were not changed in any of the tissues. These are likely important housekeeping genes that are essential for all tissues. The number of significantly different transcripts varied by tissue and were 200 for hypothalamus to telencephalon (Supplementary Figure
1A
), 11,282 transcripts comparing the hypothalamus to liver (Supplementary Figure
1B
), 10,775 transcripts comparing the liver to telencephalon (Supplementary Figure
1C
), 10,816 for the gut to telencephalon (Supplementary Figure
1D
), 6,237 for gut to hypothalamus (Supplementary Figure
1E
), and 10,143 for gut to liver (Supplementary Figure
1F
). Comparison of expressed genes in the four tissues analyzed is shown in Figure
. It is clear from this heatmap that the telencephalon and hypothalamus share the most expressed genes, with the three biological samples intermingling in the figure, while the gut and the liver are quite distinct. A recently published study mapping the human proteome also found lower correlations between brain and digestive tissues and higher correlations between liver and digestive tissues when investigating transcript expression (
).
Figure 3
A better and more holistic approach to analyzing the data is to compare subnetworks of genes involved in cellular processes for each of the tissues (Tables
, Supplementary Tables
). To do this, FHM transcripts were converted to human homologs, and transcripts that shared the same human homolog were summed. Transcript counts were normalized to the hypothalamus to compare to the proteomics data. Transcripts that were expressed at least 2-fold higher than in the hypothalamus were imported into PathwayStudio
TM
for SNEA.
Table 2
Total # of neighbors
# of measured neighbors
Gene set seed
Median change
-value
215
50
Intestinal absorption
15.92
2.85E-07
122
17
Gut development
31.84
1.19E-05
72
18
Lipid absorption
47.80
5.68E-05
139
27
Lipid export
19.38
2.99E-04
102
19
Bile secretion
29.33
3.77E-04
155
18
Lipoprotein metabolism
15.17
8.50E-04
44
10
Gastrointestinal system absorption
106.84
8.68E-04
66
16
Drug transport
29.33
1.33E-03
47
Gastrointestinal system digestion
115.65
1.45E-03
10
573
47
Energy homeostasis
6.83
1.71E-03
11
209
19
Transcytosis
8.63
1.85E-03
12
100
15
Intestine function
29.33
1.96E-03
13
245
22
Fluid secretion
7.26
2.00E-03
14
159
22
Intestine barrier
23.44
2.13E-03
15
55
13
Gallstone formation
31.43
2.68E-03
Subnetwork enrichment analysis of gene sets specific for the gastrointestinal tract.
As expected, SNEA revealed tissue-specific enrichment of cellular processes relevant to known functions of each tissue. For example, 76 cellular processes had a
-value < 0.05 in the gut, including intestinal absorption, gut development, lipid absorption, gastrointestinal system absorption, and gastrointestinal system digestion (Table
). In the liver, 48 cellular processes had
-values < 0.05 including fibrinolysis, liver development, hepatic regeneration, glycogenesis and glycogen degradation, and liver metabolism (Table
). For the hypothalamus, 100 cellular processes had
-values < 0.05 and are involved in a myriad of processes such as neuron and brain development, nervous system development, neurogenesis, axon cargo transport, locomotion, neuroimmunomodulation, pituitary gland function and hormone generation, transmission of nerve impulse and nerve regeneration and potential, and neuroprotection and neurotransmitter uptake (Table
), underscoring the importance of this part of the brain in controlling multiple organs and their functions. Finally, only 16 cellular processes in the telencephalon had
-values < 0.05 (Table
) and included neuron development, neurogenesis, axogenesis, stem cell proliferation, neuron differentiation, and neuronal plasticity. As expected, there was a lot of overlap between the hypothalamus and the telencephalon, but discrete differences could also be identified.
Table 3
Total # of neighbors
# of measured neighbors
Gene set seed
Median change
-value
174
18
Fibrinolysis
81.23
6.46E-06
78
15
Blood clot lysis
81.23
7.66E-06
242
13
Neutrophil chemotaxis
215.03
3.36E-04
158
16
microcirculation
25.18
3.01E-03
81
Sex maturation
90.17
3.17E-03
13
330
26
Liver development
17.51
1.13E-02
21
438
40
Hepatic regeneration
9.67
1.66E-02
22
409
23
Tissue remodeling
13.26
1.72E-02
30
325
12
Immunomodulation
47.50
2.81E-02
31
631
41
Fertilization
8.51
3.11E-02
33
129
11
Leukocyte accumulation
10.00
3.41E-02
34
52
Glycogenesis
70.15
3.45E-02
37
106
10
Glycogen degradation
9.45
3.73E-02
44
228
31
Liver metabolism
7.90
4.64E-02
46
72
11
Lipid absorption
7.90
4.73E-02
Selected subnetwork enrichment pathways for the liver.
Table 4
Total # of neighbors
Overlap
Percent overlap
Gene set seed
-value
319
Neuron development
6.03E-05
1,100
12
Brain development
1.31E-04
1,017
11
Nervous system development
2.78E-04
12
1,405
13
Neurogenesis
3.36E-04
15
338
Axon cargo transport
6.46E-04
16
951
10
Locomotion
6.76E-04
30
16
11
Neuroimmunomodulation
1.26E-03
32
159
Pituitary gland function
1.56E-03
35
887
Transmission of nerve impulse
1.64E-03
36
19
10
Olfactory bulb development
1.75E-03
37
430
Nerve regeneration
2.21E-03
52
106
Nerve potential
4.53E-03
87
944
Neuroprotection
8.77E-03
94
49
Neurotransmitter uptake
1.06E-02
95
50
Hormone biosynthesis
1.10E-02
Selected subnetwork enrichment pathways for the hypothalamus.
Table 5
Total # of neighbors
# of measured neighbors
Gene set seed
Median change
-value
264
10
Neuron development
9.19
2.23E-03
105
Forebrain development
6.26
3.31E-03
1,129
21
Neurogenesis
3.31
5.76E-03
182
Cell fate specification
3.80
2.08E-02
425
Axonogenesis
4.01
2.23E-02
2,002
26
Transcription activation
3.31
2.36E-02
492
Stem cell proliferation
2.37
3.51E-02
136
Neurulation
6.26
3.57E-02
478
Organogenesis
4.30
3.68E-02
10
471
Neuronal migration
5.87
4.04E-02
11
5,848
63
Cell differentiation
2.77
4.17E-02
12
346
Axon guidance
3.38
4.23E-02
13
207
10
Neuron differentiation
2.82
4.34E-02
14
6,886
62
Cell proliferation
2.61
4.83E-02
15
1,107
23
Cell fate
2.42
4.86E-02
16
446
Neuronal plasticity
7.95
4.89E-02
Selected subnetwork enrichment pathways for the telencephalon.
Although we did not detect mRNA or proteins for all nuclear receptors, we were able to predict which nuclear receptors and transcription factors would be expected to regulate downstream gene expression in each tissue, using the RNA-seq results in a more holistic, network-based approach. Lack of detection of nuclear receptors is a common result due to their poor stoichiometry and this supports the use of network-based analyses to delineate nuclear receptor-mediated signaling mechanisms. We also identified upstream regulatory targets, including transcritption factors and signaling pathway components, that were likely to drive the expression of the genes that were highly expressed in each tissue (Fold Change > 2). A list containing all of the gene symbols and names for transcriptional regulators identified is available in supplemental information (Supplementary Table
). For the gut tissue, 79 expression targets were identified (Figure
4A
), 49 expression targets were identified in the liver tissue (Figure
4B
), and there were 106 combined expression targets for the hypothalamus and telencephalon (Figure
4C
). The liver and gut shared more expression targets (17) than either the liver and brain (2) or the gut and brain (2). Only two expression targets were shared among all tissues, which were two isoforms of fibroblast growth factor (FGF), a mediator of differentiation and development of numerous cell types throughout the body (
51
). Interestingly, in the gut and liver, the majority of the upstream regulatory targets were nuclear transcription factors (48% gut, 47% liver, 31% brain); however, in the brain a higher proportion of the upstream regulatory targets were extracellular proteins and ligands (18% for gut and liver and 27% for brain), or membrane receptors (26% gut, 22% liver, 38% brain). These data are intriguing given the growing appreciation for the importance of membrane receptors and endocrine ligands and their signaling mechanisms in the brain, particularly for neuroendocrine functions and responses to endocrine modulators such as ethinylestradiol or levonorgestrel (
28
52
).
Figure 4
Confirmation of RNAseq transcript data with quantitative PCR
Results from qPCR analysis of select tissue specific transcrips indicated good agreement between RNAseq data and qPCR. RNAseq data indicated that Peptide transporter 1 (
pept1
), a transporter that is responsible for moving small polypeptides from the gastrointestinal lumen into the gastrointestinal system, was highly expressed (>200 fold) in the gastrointestinal tissues, when compared to all other tissues (
36
). Results from the qPCR analysis confirmed this finding with a >50 fold increase in expression in the gastrointestinal tissue (Figure
5A
). Expression of estrogen receptor 2b (
esr2b
) was found to be highest in the liver tissues, followed closely by the gut tissue, with much lower expression in the brain tissues, which was mirrored by the qPCR data (Figure
5B
) and as mentioned above by the work of Filby and Tyler (
50
). Expression of lipoprotein lipase (
lpl
), an enzyme responsible for lipid digestion, was increased 4 fold in the liver when compared to other tissues, which was found to be similar in the qPCR data as well (Figure
5C
). Finally, expression of
cyp19a1b
(aromatase b), an enzyme responsible for conversion of testosterone to estradiol, was high in the brain tissues from the RNAseq datasets, with very little expression in the liver and gut (
53
). These data were also confirmed by qPCR with high levels of expression in the hypothalamus and telencephalon and no detectable levels of expression of this transcript in the liver or gut (Figure
5D
).
Figure 5
Tissue-specific protein expression
Protein identification and differential expression were computed using proteomics specific algorithms, such as Protein Pilot. We obtained 150,150 spectra from the 10 protein salt fractions eluted from the SCX column, and we were able to identify 26,396 distinct peptides at a 1% global FDR, which resulted in the identification of 4,045 protein groups at a 1% global FDR. Of note, a tradeoff exists between database comprehensiveness and redundancy. Only 40% of identified protein groups had unambiguous identifications, suggesting a high level of redundancy in the database. This high level of redundancy is expected because the database consists of both FHM and zebrafish sequences (Supplementary Figure
2A
), and the peptide dynamic range was calculated by ProteinPilot to span 2.95 orders of magnitude.
Overall, an average of 3,840 (range of 3,838–3,841) proteins were quantified in each tissue (Supplementary Figure
2B
). Of those, 69.76% (69.05–69.92%) were supported with enough evidence to calculate a
-value testing the hypothesis that differences observed in iTRAQ label ratios were random. The median log ratio for gut tissue was consistent across both replicates; however, there was a bit of variability between the telencephalon (−0.02 and 0.16) and liver (0.09 and 0.33) replicates. Consistency amongst replicates was the highest for the liver and gut, and lowest for the telencephalon (Supplementary Figure
2C
).
Correlations between expressed proteins among the tissues is shown in Figure
. The most similar were the hypothalamus and telencephalon, with an
value of 0.967 (Figure
6A
). This was expected as there are small differences in structural proteins among different parts of the brain. Comparing proteins of the gut with the liver shows an
value of 0.467 (Figure
6B
). These were the second most similar comparison. There was little similarity between telencephalon and liver (
= 0.089) (Figure
6C
) or between telencephalon and gut (
= 0.175) (Figure
6D
), underscoring the different functions of these disparate tissues.
Figure 6
As previously mentioned, the hypothalamus and telencephalon had a high degree of similarity; however, there were some important differences noted. Specifically, glial fibrillary acidic protein (GFAP) was higher in the telencephalon than in the hypothalamus, while neurofilament medium polypeptide (NEFM) was higher in the hypothalamus. GFAP is an intermediate filament protein that is synthesized only in astrogliocytes in the brain. It provides cytoskeletal structure for these cells and has a critical role in their activation when the brain becomes injured through disease or from traumatic brain injury (
54
). Our data suggests that there may be more astroglial cells in the telencephalon than in the hypothalamus. NEFM is a member of the neurofilament family consisting of light, medium and heavy neurofilaments. These are the major structural components of axons (
55
) and are responsible for the radial growth of the axon. It is clear now that NEFM respond to a myelin signal, probably through a phosphorylation cascade (
55
). Our results suggest that in fathead minnows, the hypothalamus contains more long axons than the telencephalon. This may facilitate longer-range interactions between neurons.
SNEA analysis was clearly able to differentiate tissue-specific biological functions enriched with the proteins identified in the iTRAQ experiment. In the gut, 37 subnetworks were found to be enriched including intestinal barrier, intestine function and lipid adsorption (Supplementary Table
). In the liver, 37 subnetworks were identified including detoxification, xenobiotic clearance, liver metabolism, and liver function (Supplementary Table
). The genes that were higher in the telencephalon and hypothalamus were combined into a single list for the brain, which was used for SNEA. The analysis identified over 100 subnetworks including neurotransmitter secretion, synaptic transmission, regeneration, and brain function (Supplementary Table
).
Comparison of RNA-seq with proteomics
Pairwise comparisons were made to investigate the level of agreement between transcript log ratios obtained from RNA-seq and protein log ratios obtained from iTRAQ. The pairwise comparisons made at human homolog level are shown in Figure
. We had expected to see a positive correlation for each entity between RNA-seq and proteomics for each tissue, but, as can be observed, this is not the case for all genes. A positive log ratio for RNA expression, with a negative log ratio for proteins was not observed in any tissue. In the telencephalon, most log ratios are close to zero as there were few differences from the hypothalamus detected by either RNA-seq or iTRAQ. In the liver, about half (59%) of the genes were in agreement, while the other half had positive protein log ratios and negative RNA log ratios. In the gut, 69% of the genes were in agreement and only 31% had positive protein log ratios and negative RNA log ratios. The slopes of the regression lines are 0.662 (R
= 0.2912), 1.831 (R
= 0.141), and 2.133 (R
= 0.324) for the telencephalon, liver, and gut, respectively. Some of the variation could be due to ratio compression, a well-known artifact of iTRAQ proteomics (
56
57
) given that these slopes are similar to those observed in these other studies.
Figure 7
Additionally, differences between protein and RNA levels for specific genes could be due to differential regulation in translation or turnover rates of protein and/or its transcript. For example, in the liver and the gut, fatty acid binding protein 7 (fabp7) had positive protein log ratios but negative RNA log ratios. These data suggest that the liver and gut have more fabp7 protein than the hypothalamus while there is more message in the hypothalamus (Figure
). The common qPCR reference gene, glyceraldehyde 3-phosphate dehydrogenase (gapdh) also had higher protein levels in the liver compared to the hypothalamus, but less message. Alternatively, there were many cases in which the protein ratios in the liver or gut were positive, but much less than the ratio for RNA. Some examples are fatty acid binding protein 2 (fabp2), dipeptidase 1 (dpep1), and annexin 2 (anxa2) in the gut, carboxypeptidase A1 (cpa1) in the liver and gut, and the fibrinogen subunits (fga, fgg, fgb), 3-oxoacid CoA-transferase 1 (oxct1), urate oxidase (uox), and tetratricopeptide repeat domain 36 (ttc36) in the liver. Conversely, some genes exhibited high protein expression, but low RNA expression. A similar phenomenon has been seen in plants where iron deficiency results in increased protein expression of members of the conserved eukaryotic elongation factor 5A (eIF5A) family without a concordant increase in mRNA abundance (
58
). This can also be explained by differential half-lives, i.e. the half-life of a protein can be much longer than that of the RNA, as is the case for ribosomal proteins. There are roughly ten million ribosomes per eukaryotic cell and they are fairly stable compared to the half-lives of mRNAs for the ribosomal proteins, which are fairly short by comparison (
59
). Proteomics and transcriptomics measurements are made on increases or decreases from the steady state level of these molecules in tissues, which is quite different for mRNA and protein for ribosomes. Further investigations will be needed to determine if variations are an artifact of iTRAQ ratio compression or a true difference in the magnitude of expression.
To examine higher order similarities and differences between the tissue RNA-seq and proteomics datasets, we utilized PathwayStudio
TM
's SNEA on genes and proteins, which measured at least 2-fold higher than in the hypothalamus tissue. A comprehensive list of subnetworks enriched in the RNA-seq and proteomic datasets in each tissue is provided in Supplementary Tables
. Of note, there was overlap in enriched cell processes between transcriptomic and proteomic datasets from each respective tissue. Specifically, there were 8 cell processes common across both datasets in the gut. A subset of these shared cell processes is shown in Figure
all of which are processes that would be expected in the gut, including lipid absorption, lipoprotein metabolism, intestinal barrier function, and general intestinal function. For the liver datasets, 3 common cell processes were found to be enriched and all were related to liver function including hepatic regeneration, liver metabolism, and liver development (Figure
). Finally, when comparing enriched cell processes in the brain between the RNA and protein datasets, 21 cell processes are common between the two datasets. A subset of these process is given in Figure
10
, which demonstrates enrichment of brain development, neurotransmission, regeneration, neurite outgrowth, and nerve cell differentiation. If we examine genes/proteins associated with these overlapping enriched cell processes, we find that only a few are conserved among the two datasets for each tissue, which are circled in green (Gut: 3, Liver: 5, Brain: 2).
Figure 8
Figure 9
Figure 10
Taken as a whole, the RNA and protein datasets identified numerous cell processes that are unique to each dataset. Overlapping cell processes were typically those specific to each tissue, indicating that both measurements are likely to converge on cell processes and functions that are strongly associated with those specific tissues despite very few individual genes/proteins coinciding between the two datasets.
Relationship of findings to endocrinology
It is important for researchers to understand the tissue-specific expression of receptors for peptide and steroid-based hormones. The database we have created by combining the PacBio data set with multiple Illumina RNA-seq data sets will enable researchers to find sequences for genes of interest that may propel their research to a new level. As mentioned above, our data for esr2a and esr2b matched perfectly to data obtained by Northern blots (
50
), thus indicating that the RNA-seq data, despite going through an amplification scheme, closely matches the actual relative concentrations of important genes.
Conclusions
This study is the first to apply single DNA molecule sequencing to generate a transcriptome for FHM. This transcriptome was made up of transcripts from whole brain, gut, liver, gonad, heart, gill, head kidney, and trunk kidney and is robust. It will serve as a good scaffold for future transcriptomics and proteomics projects and may have some utility to help with the FHM genome annotation. In addition, we mapped tissue-specific genes for gut, liver, hypothalamus and telencephalon proteomes and transcriptomes in order to identify and characterize their specific components in each tissue to highlight the utility of our transcriptomic and proteomic sequence databases and to identify cellular pathways enriched during homeostasis that may inform relevant endpoints in future ecotoxicogenomic studies in the ecologically relevant FHM. Our results showed that both the transcriptomes and the proteomes differed by tissue, with the hypothalamus and the telencephalon presenting the highest degree of similarity. The transcriptomic and proteomic sequence information generated in this study will be invaluable in future functional genomic studies investigating the effects of endocrine disrupting chemicals present in the environment on endocrine active tissues of the ecologically-relevent FHM, particularly the neuro-endocrine ssytem. The data is publicly available.
Statements
Data availability statement
RNAseq data can be found at GEO with accession #
GSE119871
Proteomics data sets have been submitted to the ProteomeXchange Consortium via PRIDE with the dataset identifier PXD010216.
Proteomics information for the identification of proteins/peptides from mass spectra will be supplied as an excel spreadsheet upon request by ND.
Ethics statement
This study was carried out in accordance with the recommendations of the University of Florida IACUC committee. The protocol was approved by the University of Florida IACUC committee.
Author contributions
JB, NG-R, TS-A, and ND conceived of the project, helped with analysis and writing of the manuscript. CL, LS, and JB performed the experiments, analyzed data, and contributed to the writing of the manuscritpt. FY performed bioinformatics analysis and annotation for long reads from the PacBio instrument. He also performed the RNA-seq analysis. CS-S performed the iTRAQ experiments by LC MS/MS. CL performed bioinformatics and statistical analysis of the proteomics data and the RNA-seq data. AB and JB performed the qPCR analysis. DM-A discussed experimental strategy and performed the PACBio sequencing. CL, LS, JB, DM-A, FY, CS-S, AB, TS-A and ND wrote sections of the manuscript and all authors have read and approved the submitted version.
Acknowledgments
We wish to acknowledge support from the NSF CBET grant #1605119 to JB and TS-A and NSF EAGER grant #1602318 to TS-A for this project. This work was also partly supported by the US Army Environmental Quality and Installations Research Program (NG-R).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Supplementary material
The Supplementary Material for this article can be found online at:
Supplementary Figure 1
Pairwise comparisons for differential transcript expression were made for each tissue; hypothalamus to telencephalon
(A)
, hypothalamus to liver
(B)
, liver to telencephalon
(C)
, gut to telencephalon
(D)
, gut to hypothalamus
(E)
, and gut to liver
(F)
. Black dots represent transcripts that were compared and red dots represent transcripts that were found to be statistically different at the 5% FDR cutoff. The data points forming a column on the left most portion of the graph represent transcripts that were measured in only one of the tissues being compared.
Supplementary Figure 2
Quality metrics for iTRAQ data and protein identification.
(A)
Ambiguity was assessed at both the level of protein.
(B)
The number of proteins that we quantified, quantified confidently, and the median log ratio for each iTRAQ label were assessed.
(C)
Correlations coefficients (r) between individual iTRAQ labeled samples are displayed.
Supplementary Table 1
Primer sequences, sources, and efficiencies for qPCR analysis.
Supplementary Table 2
Transcriptomics SNEA illustrating regulation of cell processes in the gut.
Supplementary Table 3
Transcriptomics SNEA illustrating regulation of cell processes in the liver.
Supplementary Table 4
Transcriptomics SNEA illustrating regulation of cell processes in the telencephalon.
Supplementary Table 5
Transcriptomics SNEA illustrating regulation of cell processes in the hypothalamus.
Supplementary Table 6
Expression targets derived from Pathway Studio for gut, liver and brain. These are the genes highlighted in Figure
Supplementary Table 7
Proteomics SNEA results for regulation of cell processes in the gut.
Supplementary Table 8
Proteomics SNEA results for regulation of cell processes in the liver.
Supplementary Table 9
Proteomics SNEA results for regulation of cell processes in the brain.
References
1.
Garcia-Reyero
Perkins
EJ
Systems biology: leading the revolution in ecotoxicology
Environ Toxicol Chem.
2011
30
265
73
10.1002/etc.401
Pubmed Abstract
CrossRef
Google Scholar
2.
Williams
TD
Turan
Diab
AM
Wu
Mackenzie
Bartie
KL
et al
Towards a system level understanding of non-model organisms sampled from the environment: a network biology approach
PLoS Comput Biol.
2011
e1002126
10.1371/journal.pcbi.1002126
Pubmed Abstract
CrossRef
Google Scholar
3.
Armengaud
Trapp
Pible
Geffard
Chaumot
Hartmann
EM
Non-model organisms, a species endangered by proteogenomics
J Proteomics
2014
105
18
10.1016/j.jprot.2014.01.007
Pubmed Abstract
CrossRef
Google Scholar
4.
Uhlen
Fagerberg
Hallstrom
BM
Lindskog
Oksvold
Mardinoglu
et al
Proteomics
Tissue-based map of the human proteome Science
2015
347
1260419
10.1126/science.1260419
Pubmed Abstract
CrossRef
Google Scholar
5.
Garcia-Reyero
Griffitt
RJ
Liu
Kroll
KJ
Farmerie
WG
Barber
DS
et al
Construction of a robust microarray from a non-model species (largemouth bass) using pyrosequencing technology
J Fish Biol.
2008
72
2354
76
10.1111/j.1095-8649.2008.01904.x
Pubmed Abstract
CrossRef
Google Scholar
6.
Garcia-Reyero
Tingaud-Sequeira
Cao
Zhu
Perkins
EJ
Hu
Endocrinology: advances through omics and related technologies
Gen Comp Endocrinol.
2014
203
262
73
10.1016/j.ygcen.2014.03.042
Pubmed Abstract
CrossRef
Google Scholar
7.
Perkins
EJ
Ankley
GT
Crofton
KM
Garcia-Reyero
LaLone
CA
Johnson
MS
et al
Current perspectives on the use of alternative species in human health and ecological hazard assessments
Environ Health Perspect.
2013
121
1002
10
10.1289/ehp.1306638
Pubmed Abstract
CrossRef
Google Scholar
8.
Ankley
GT
Gray
LE
Cross-species conservation of endocrine pathways: a critical analysis of tier 1 fish and rat screening assays with 12 model chemicals
Environ Toxicol Chem.
2013
32
1084
10.1002/etc.2151
Pubmed Abstract
CrossRef
Google Scholar
9.
Held
JW
Peterka
JJ
Age, growth, and food-habits of fathead minnow, pimephales-promelas, in North-Dakota saline lakes
Trans Am Fish Soc.
1974
103
743
56
10.1577/1548-8659(1974)103<743:AGAFHO>2.0.CO;2
CrossRef
Google Scholar
10.
Bardach
JE
Bernstein
JJ
Hart
JS
Brett
JR
Tolerance to temperature extremes: animals. Part IV Fishes In
: Altman PL, Dittmer D.
Environmental Biology.
Bethesda, MD
Federation of American Societies for Experimental Biology
. (
1966
) p.
37
80
Google Scholar
11.
McCarraher
DB
Thomas
Some ecological observations on fathead minnow Pimephales Promelas in alkaline waters of Nebraska
Trans Am Fish Soc.
1968
97
52
10.1577/1548-8659(1968)97[52:SEOOTF]2.0.CO;2
CrossRef
Google Scholar
12.
Flickinger
SA
Determination of sexes in fathead minnow
Trans Am Fish Soc.
1969
98
526
10.1577/1548-8659(1969)98[526:DOSITF]2.0.CO;2
CrossRef
Google Scholar
13.
Cole
KS
Smith
RJF
Male courting behavior in the fathead minnow, Pimephales-Promelas
Environ Biol Fish.
1987
18
235
Google Scholar
14.
Jensen
KM
Korte
JJ
Kahl
MD
Pasha
MS
Ankley
GT
Aspects of basic reproductive biology and endocrinology in the fathead minnow (Pimephales promelas)
Comp Biochem Physiol C Toxicol Pharmacol.
2001
128
127
41
10.1016/S1532-0456(00)00185-X
Pubmed Abstract
CrossRef
Google Scholar
15.
Vignet
Parrott
Maturation of behaviour in the fathead minnow
nProcesses
2017
138
15
21
10.1016/j.beproc.2017.02.004
Pubmed Abstract
CrossRef
Google Scholar
16.
USEPA
Guidelines for the Culture of Fathead Minnows (Pimephales promelas) for Use in Toxicity Tests.
Duluth, MN
1987
).
Google Scholar
17.
Ankley
GT
Villeneuve
DL
The fathead minnow in aquatic toxicology: past, present and future
Aquat Toxicol.
2006
78
91
102
10.1016/j.aquatox.2006.01.018
Pubmed Abstract
CrossRef
Google Scholar
18.
OECD
Test No. 210: Fish early life-stage toxicity test
. In:
OECD Guidelines for the Testing of Chemicals.
Paris
1992
) p.
18
Google Scholar
19.
USEPA
Short-Term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters to Freshwater Organisms
3rd ed.
Washington, DC
1994
).
Google Scholar
20.
USEPA
Fish short-term reproduction assay
. In:
Endocrine Disruptor Screening Program Test Guidelines
. Washington, DC (
2009
) p.
93
Google Scholar
21.
Ankley
GT
Jensen
KM
Kahl
MD
Korte
JJ
Makynen
EA
Description and evaluation of a short-term reproduction test with the fathead minnow (Pimephales promelas)
Environ Toxicol Chem.
2001
20
1276
90
10.1002/etc.5620200616
Pubmed Abstract
CrossRef
Google Scholar
22.
Ankley
GT
Kahl
MD
Jensen
KM
Hornung
MW
Korte
JJ
Makynen
EA
et al
Evaluation of the aromatase inhibitor fadrozole in a short-term reproduction assay with the fathead minnow (Pimephales promelas)
Toxicol Sci.
2002
67
121
30
10.1093/toxsci/67.1.121
Pubmed Abstract
CrossRef
Google Scholar
23.
Ankley
GT
Jensen
KM
Makynen
EA
Kahl
MD
Korte
JJ
Hornung
MW
et al
Effects of the androgenic growth promoter 17-beta-trenbolone on fecundity and reproductive endocrinology of the fathead minnow
Environ Toxicol Chem.
2003
22
1350
60
10.1002/etc.5620220623
Pubmed Abstract
CrossRef
Google Scholar
24.
Noyes
PD
Hinton
DE
Stapleton
HM
Accumulation and debromination of decabromodiphenyl ether (BDE-209) in juvenile fathead minnows (Pimephales promelas) induces thyroid disruption and liver alterations
Toxicol Sci
2011
122
265
74
10.1093/toxsci/kfr105
Pubmed Abstract
CrossRef
Google Scholar
25.
Vergauwen
Cavallin
JE
Ankley
GT
Bars
Gabriels
IJ
Michiels
EDG
et al
Gene transcription ontogeny of hypothalamic-pituitary-thyroid axis development in early-life stage fathead minnow and zebrafish
Gen Comp Endocrinol.
2018
266
87
100
10.1016/j.ygcen.2018.05.001
Pubmed Abstract
CrossRef
Google Scholar
26.
Popesku
JT
Tan
EY
Martel
PH
Kovacs
TG
Rowan-Carroll
Williams
et al
Gene expression profiling of the fathead minnow (Pimephales promelas) neuroendocrine brain in response to pulp and paper mill effluents
Aquat Toxicol.
2010
99
379
88
10.1016/j.aquatox.2010.05.017
Pubmed Abstract
CrossRef
Google Scholar
27.
Weinberger
J II
Klaper
Environmental concentrations of the selective serotonin reuptake inhibitor fluoxetine impact specific behaviors involved in reproduction, feeding and predator avoidance in the fish Pimephales promelas (fathead minnow)
Aquat Toxicol.
2014
151
77
83
10.1016/j.aquatox.2013.10.012
Pubmed Abstract
CrossRef
Google Scholar
28.
Smith
LC
Lavelle
CM
Silva-Sanchez
Denslow
ND
Sabo-Attwood
Early phosphoproteomic changes for adverse outcome pathway development in the fathead minnow (Pimephales promelas) brain
Sci Rep.
2018
10212
10.1038/s41598-018-28395-w
Pubmed Abstract
CrossRef
Google Scholar
29.
Olmstead
AW
Villeneuve
DL
Ankley
GT
Cavallin
JE
Lindberg-Livingston
Wehmas
LC
et al
A method for the determination of genetic sex in the fathead minnow, Pimephales promelas, to support testing of endocrine-active chemicals
Environ Sci Technol.
2011
45
3090
10.1021/es103327r
Pubmed Abstract
CrossRef
Google Scholar
30.
Thorpe
KL
Pereira
ML
Schiffer
Burkhardt-Holm
Weber
Wheeler
JR
Mode of sexual differentiation and its influence on the relative sensitivity of the fathead minnow and zebrafish in the fish sexual development test
Aquat Toxicol.
2011
105
412
20
10.1016/j.aquatox.2011.07.012
Pubmed Abstract
CrossRef
Google Scholar
31.
Coulter
DP
Hook
TO
Mahapatra
CT
Guffey
SC
Sepulveda
MS
Fluctuating water temperatures affect development, physiological responses and cause sex reversal in fathead minnows
Environ Sci Technol.
2015
49
1921
10.1021/es5057159
Pubmed Abstract
CrossRef
Google Scholar
32.
Ali
JM
Palandri
MT
Kallenbach
AT
Chavez
Ramirez
Onanong
et al
Estrogenic effects following larval exposure to the putative anti-estrogen, fulvestrant, in the fathead minnow (Pimephales promelas)
Comp Biochem Physiol C Toxicol Pharmacol.
2018
204
26
35
10.1016/j.cbpc.2017.10.013
Pubmed Abstract
CrossRef
Google Scholar
33.
Burns
FR
Cogburn
AL
Ankley
GT
Villeneuve
DL
Waits
Chang
YJ
et al
Sequencing and
de novo
draft assemblies of a fathead minnow (Pimephales promelas) reference genome
Environ Toxicol Chem.
2016
35
212
10.1002/etc.3186
Pubmed Abstract
CrossRef
Google Scholar
34.
Saari
TW
Schroeder
AL
Ankley
GT
Villeneuve
DL
First-generation annotations for the fathead minnow (Pimephales promelas) genome
Environ Toxicol Chem.
2017
36
3436
42
10.1002/etc.3929
Pubmed Abstract
CrossRef
Google Scholar
35.
Garcia-Reyero
Kroll
KJ
Liu
Orlando
EF
Watanabe
KH
Sepulveda
MS
et al
Gene expression responses in male fathead minnows exposed to binary mixtures of an estrogen and antiestrogen
BMC Genomics
2009
10
308
10.1186/1471-2164-10-308
Pubmed Abstract
CrossRef
Google Scholar
36.
Bisesi
JHJr
Robinson
SE
Lavelle
CM
Ngo
Castillo
Crosby
et al
Influence of the gastrointestinal environment on the bioavailability of ethinyl estradiol sorbed to single-walled carbon nanotubes
Environ Sci Technol.
2017
51
948
57
10.1021/acs.est.6b04728
Pubmed Abstract
CrossRef
Google Scholar
37.
Schreiner
Nguyen
TM
Russo
Heber
Patrignani
Ahrne
et al
Targeted combinatorial alternative splicing generates brain region-specific repertoires of neurexins
Neuron
2014
84
386
98
10.1016/j.neuron.2014.09.011
Pubmed Abstract
CrossRef
Google Scholar
38.
Tilgner
Grubert
Sharon
Snyder
MP
Defining a personal, allele-specific, and single-molecule long-read transcriptome
Proc Natl Acad Sci USA.
2014
111
9869
74
10.1073/pnas.1400447111
Pubmed Abstract
CrossRef
Google Scholar
39.
PACBIO
Procedure and Checklist—Isoform Sequencing (Iso-Seq Analysis) using the clontech SMARTer cDNA Synthesis Kit and SageELF Size-selection System
. (
2018
). Available online at:
Google Scholar
40.
Minoche
AE
Dohm
JC
Schneider
Holtgrawe
Viehover
Montfort
et al
Exploiting single-molecule transcript sequencing for eukaryotic gene prediction
Genome Biol.
2015
16
184
10.1186/s13059-015-0729-7
Pubmed Abstract
CrossRef
Google Scholar
41.
Martin
Cutadapt removes adapter sequences from high-throughput sequencing reads
EMBnet J
. (
2011
17.1
10
10.14806/ej.17.1.200
CrossRef
Google Scholar
42.
Grabherr
MG
Haas
BJ
Yassour
Levin
JZ
Thompson
DA
Amit
et al
Full-length transcriptome assembly from RNA-Seq data without a reference genome
Nat Biotechnol.
2011
29
644
52
10.1038/nbt.1883
Pubmed Abstract
CrossRef
Google Scholar
43.
Luo
Liu
Xie
Li
Huang
Yuan
et al
SOAPdenovo2: an empirically improved memory-efficient short-read
de novo
assembler
Gigascience
2012
18
10.1186/2047-217X-1-18
Pubmed Abstract
CrossRef
Google Scholar
44.
Yao
JQ
Yu
DEB: a web interface for RNA-seq digital gene expression analysis
Bioinformation
2011
44
. Available online at:
Pubmed Abstract
Google Scholar
45.
Vizcaino
JA
Cote
RG
Csordas
Dianes
JA
Fabregat
Foster
JM
et al
The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013
Nucleic Acids Res.
2013
41
D1063
10.1093/nar/gks1262
Pubmed Abstract
CrossRef
Google Scholar
46.
Weirather
JL
de Cesare
Wang
Piazza
Sebastiano
Wang
XJ
et al
Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis
F1000Res.
2017
100
10.12688/f1000research.10571.2
Pubmed Abstract
CrossRef
Google Scholar
47.
Rhoads
Au
KF
PacBio sequencing and its applications
Genom Prot Bioinform.
2015
13
278
89
10.1016/j.gpb.2015.08.002
Pubmed Abstract
CrossRef
Google Scholar
48.
Tardaguila
de la Fuente
Marti
Pereira
Pardo-Palacios
FJ
Del Risco
et al
SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification
Genome Res
. (
2018
28
396
411
10.1101/gr.222976.117
Pubmed Abstract
CrossRef
Google Scholar
49.
Bayega
Fahiminiya
Oikonomopoulos
Ragoussis
Current and future methods for mRNA analysis: a drive toward single molecule sequencing
Methods Mol Biol.
2018
1783
209
41
10.1007/978-1-4939-7834-2_11
Pubmed Abstract
CrossRef
Google Scholar
50.
Filby
AL
Tyler
CR
Molecular characterization of estrogen receptors 1, 2a, and 2b and their tissue and ontogenic expression profiles in fathead minnow (Pimephales promelas)
Biol Reprod.
2005
73
648
62
10.1095/biolreprod.105.039701
Pubmed Abstract
CrossRef
Google Scholar
51.
Green
PJ
Walsh
FS
Doherty
Promiscuity of fibroblast growth factor receptors
Bioessays
1996
18
639
46
10.1002/bies.950180807
Pubmed Abstract
CrossRef
Google Scholar
52.
Vasudevan
Pfaff
DW
Non-genomic actions of estrogens and their interaction with genomic actions in the brain
Front Neuroendocrinol.
2008
29
238
57
10.1016/j.yfrne.2007.08.003
Pubmed Abstract
CrossRef
Google Scholar
53.
Mouriec
Gueguen
MM
Manuel
Percevault
Thieulant
ML
Pakdel
et al
Androgens upregulate cyp19a1b (aromatase B) gene expression in the brain of zebrafish (Danio rerio) through estrogen receptors
Biol Reprod.
2009
80
889
96
10.1095/biolreprod.108.073643
Pubmed Abstract
CrossRef
Google Scholar
54.
Yang
Wang
KK
Glial fibrillary acidic protein: from intermediate filament assembly and gliosis to neurobiomarker
Trends Neurosci.
2015
38
364
74
10.1016/j.tins.2015.04.003
Pubmed Abstract
CrossRef
Google Scholar
55.
Garcia
ML
Lobsiger
CS
Shah
SB
Deerinck
TJ
Crum
Young
et al
NF-M is an essential target for the myelin-directed “outside-in” signaling cascade that mediates radial axonal growth
J Cell Biol.
2003
163
1011
20
10.1083/jcb.200308159
Pubmed Abstract
CrossRef
Google Scholar
56.
Hornshoj
Bendixen
Conley
LN
Andersen
PK
Hedegaard
Panitz
et al
Transcriptomic and proteomic profiling of two porcine tissues using high-throughput technologies
BMC Genom.
2009
10
30
10.1186/1471-2164-10-30
Pubmed Abstract
CrossRef
Google Scholar
57.
Ow
SY
Salim
Noirel
Evans
Rehman
Wright
PC
iTRAQ underestimation in simple and complex mixtures: “the good, the bad and the ugly”
J Proteome Res.
2009
5347
55
10.1021/pr900634c
CrossRef
Google Scholar
58.
Lan
Li
WF
Wen
TN
Shiau
JY
Wu
YC
Lin
WD
et al
iTRAQ protein profile analysis of arabidopsis roots reveals new aspects critical for iron homeostasis
Plant Physiol.
2011
155
821
34
10.1104/pp.110.169508
Pubmed Abstract
CrossRef
Google Scholar
59.
Perry
RP
Balanced production of ribosomal proteins
Gene
2007
401
10.1016/j.gene.2007.07.007
Pubmed Abstract
CrossRef
Google Scholar
Summary
Keywords
fathead minnow
transcriptome
proteome
tissue-specific
endocrine system
proteogenomics
Citation
Lavelle C, Smith LC, Bisesi Jr. JH, Yu F, Silva-Sanchez C, Moraga-Amador D, Buerger AN, Garcia-Reyero N, Sabo-Attwood T and Denslow ND (2018)
Tissue-Based Mapping of the Fathead Minnow (
Pimephales promelas
) Transcriptome and Proteome
Front. Endocrinol.
9:611. doi:
10.3389/fendo.2018.00611
Received
05 July 2018
Accepted
26 September 2018
Published
06 November 2018
Volume
9 - 2018
Edited by
Tomer Ventura, University of the Sunshine Coast, Australia
Reviewed by
Shannon William Davis, University of South Carolina, United States; Matthew Brook, University of Edinburgh, United Kingdom
Updates
© 2018 Lavelle, Smith, Bisesi, Yu, Silva-Sanchez, Moraga-Amador, Buerger, Garcia-Reyero, Sabo-Attwood and Denslow.
This is an open-access article distributed under the terms of the
Creative Commons Attribution License (CC BY)
. The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Nancy D. Denslow
ndenslow@ufl.edu
This article was submitted to Genomic Endocrinology, a section of the journal Frontiers in Endocrinology
†These authors share first authorship
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Article metrics
View details
PDF
ReadCube
EPUB
XML
High-impact AI
AI playbook for researchers
Your step by step support for responsible and impactful AI use
Explore the guide
Outline
Figures
Cite article
Copy to clipboard
Export citation file
BibTex
EndNote
Reference Manager
Simple Text file
Share article
Email
WeChat
Share on WeChat
Scan with WeChat to share this article
Article metrics