This visualization framework aims to address current bottlenecks in the analysis of large sequence datasets (rRNA amplicons, metagenomes), helping researchers analyze high-throughput datasets more efficiently. Phich takes advantage of standard outputs from computational pipelines in order to bridge the gap between biological software (e.g. QIIME) and existing data visualization capabilities (harnessing the scalability of WebGL and HTML5 in a browser-based tool).
Phinch currently supports downstream analyses of .biom files ( Biological Observation Matrix, a JSON-formatted file type typically used to represent marker gene OTUs or metagenomic data). All sample metadata and taxonomy/ontology information MUST be embedded in the .biom file before being uploaded into Phinch.
In QIIME (version 1.7 or later), users can prepare the .biom file by executing the following commands:
First, construct an OTU table:
make_otu_table.py -i final_otu_map_mc2.txt -o otu_table_mc2_w_tax.biom -t rep_set_tax_assignments.txt
Where your input file (-i) is your OTU Map (defining clusters of raw sequences reads), and taxonomy file (-t) contains the taxonomy or gene ontology strings that correspond to each OTU.
Second, add your sample metadata to your .biom file. In QIIME version 1.8 this can be done using the following command:
biom add-metadata -i otu_table_mc2_w_tax.biom -o otu_table_mc2_w_tax_and_metadata.biom -m sample_metadata_mapping_file.txt
In QIIME version 1.7 or below, you can add metadata with the following command:
add_metadata.py -i otu_table_mc2_w_tax.biom -o otu_table_mc2_w_tax_and_metadata.biom -m sample_metadata_mapping_file.txt
Where your input file (-i) is your .biom file from the previous step, and your mapping file (-m) is a tab-delimited file containing sample metadata (formatted according to these QIIME instructions).
After these two steps, you're ready to upload.
If you want to visualize biological data currently formatted as a tab-delimited text file (e.g. the style of OTU tables produced by older versions of QIIME, the style of OTU tables produced by older versions of QIIME, or any other type of genomic data that can be reprsented in matrix format), please refer to this documentation for conversion instructions. Phinch supports both "sparse" and "dense" BIOM formats (although sparse .biom files are highly recommended, since the file size is much smaller).
Some important notes on metadata
In order to be properly detected, all date/time metadata must be standardized according to MIxS standardized format (more information at the Genomic Standards Consortium wiki), and entered into one column in your original sample metadata mapping file, as follows:
This date format lists the year, month, and day, followed by a 24hr timestamp with a UTC offset (Z). Inclusion of timestamp and UTC offset are both optional; metadata columns can include date only. For example, metadata for a sample collected at 2:30pm EST on May 4, 2007 would be entered as: 2007-04-05T14:30:00-05:00
Similarly, any geographic coordinates or GPS data must be entered as decimal degrees (the format used by GoogleMaps, e.g. -90.017926). We recommend using separate columns labeled “Latitude” and “Longitude” in your original sample metadata mapping file, to ensure that GPS metadata is correctly detected.
To label your samples in Phinch and export graphics with human-readable IDs, include a column in your metadata mapping file with the header labelled as “phinchID” (these entries can be the same or different as the first SampleID column). The phinchID values will be pulled through into the visualizations to populate graph axes. If this column is not included, an arbitrary numerical ID will be assigned to each sample.
Chrome Browser Recommended! Your browser does not support the Phinch framework!
Please cite the Phinch framework as follows: Bik, H.M., Bu, S., Grubbs, W. (manuscript in preparation) Phinch: An interactive, exploratory data visualization framework for environnmental sequence data https://github.com/PitchInteractiveInc/Phinch
Phinch is an open-source framework for visualizing biological data, funded by a grant from the Alfred P. Sloan foundation. This project represents an interdisciplinary collaboration between Pitch Interactive, a data visualization studio in Oakland, CA, and biological researchers at UC Davis. Whether it's genes, proteins, or microbial species, Phinch provides an interactive visualization tool that allows users to explore and manipulate large biological datasets. Computer algorithms face significant difficulty in identifying simple data patterns; writing algorithms to tease out complex, subtle relationships (the type that exist in biological systems) is almost impossible. However, the human eye is adept at spotting visual patterns, able to quickly notice trends and outliers. It is this philosophy especially when presented with intuitive, well-designed software tools and user interfaces. The sheer volume of data produced from high-throughput sequencing technologies will require fundamentally different approaches and new paradigms for effective data analysis. Scientific visualization represents an innovative method towards tackling the current bottleneck in bioinformatics; in addition to giving researchers a unique approach for exploring large datasets, it stands to empower biologists with the ability to conduct powerful analyses without requiring a deep level of computational knowledge.