GVis : the scalable visualization framework

GVis (A Scalable Visualization Framework for Genomic Data) is a framework with which it is possible to brose the phylogeny hierarchy of organisms from the highest level down to the level of an individual organism of interest and also analyze each interest gene by initiating the gene-finding and gene-match analyzing tool. The framework permits one to navigate through and explore large amounts of genomic data (thousand of genomes or more) using a 2.5D space layout.

All genomic data used in GVis framework follow the NCBI GenBank flat-file format. The publicly available GenBank files consist of a set of ASCII text files, most of which contain gene sequence data, and some supplemental information that contain lists of author names, journal citations, gene names, keywords, and accession numbers of the records. By extracting several important features from the GenBank files, we are able to create our own GVis data files in binary.

A genomic tree structure is built by referencing the NCBI taxonomy database. Taxonomic information can be retrieved by directly connecting to the NCBI Taxonomy Browser through HTTP protocol with specific organism names.
ORFs (Open Reading Frames) of genomic sequences are collected using the NCBI ORF Finder. An ORF represents the minimum selectable size of a gene sequence, and it includes a start codon and one or more stop codons. With a collection of ORFs, users easily can select the minimum size of selectable sequences and compare the results with other ORF sequences. For displaying the collected ORFs, an ORF tree structure is implemented.

For displaying genomic data, a Venn-diagram approach is used instead of directly referencing the NCBI tree structure.

Also we integrate a network-based gene sequence matching tool by NCBI called netBLAST. netBLAST is a publicly available gene sequence matching program that emphasizes regions of local alignments in order to detect relationships among sequences that share isolated regions of similarities. Using an inner-window, a user can select an arbitrary length of gene sequence and submit it to netBLAST. Depending on the length and type of the query gene sequence, netBLAST could return more than ten matching sequences of any length. Since there is no way for the user to know a priori the number or the lengths of the resulting sequences, we implement navigation features within the inner-window such as zooming, panning, and scrolling to allow efficient visualization of the results.

Published Papers

  • Dong Hyun Jeong, Soo-Yeon Ji, Tera Greensmith, Byunggu Yu and Remco Chang: Understanding Implicit and Explicit Interface Tools to Perform Visual Analytics Tasks, pp.687 - 694, IEEE 15th International Conference on Information Reuse & Integration (IRI) 2014.PDF
  • Dong Hyun Jeong, Tera Marie Green, William Ribarsky, and Remco Chang, Comparative Evaluation of Two Interface Tools in Performing Visual Analytics Tasks, BELIV'10: BEyond time and errors: novel evaLuation methods for Information Visualization (A Workshop of the ACM CHI 2010), 2010
  • Tera Marie Green, Dong Hyun Jeong, and Brian Fisher, Using personality factors to predict interface learning performance, In Proceedings of the Hawaii International Conference on System Sciences, Koloa, Hawaii, pp.1-10, 2010. (Best Paper Award)
  • Dong Hyun Jeong, Tera Marie Green, William Ribarsky, and Remco Chang, Comparing Two Interface Tools in Performing Visual Analytics Tasks, IEEE Symposium on Visual Analytics Science and Technology 2009 (VAST '09), Poster, pp.219-220, 2009.
  • Jin Hong, Dong Hyun Jeong, Chris D Shaw, William Ribarsky, Mark Borodovsky, and Chang Song, GVis: A Scalable Visualization Framework for Genomic Data, Joint Eurographics - IEEE VGTC Symposium on Visualization 2005 (EuroVis '05), pp.191-198, 2005. PDF
  • Dong Hyun Jeong, William Ribarsky, Larry Hodges, and Chang Song, Interactive Visualization of Genomic Data, AppliedVis 2005. Poster. April, 2005.