Google Summer of Code: Midterm Re-cap

This is the fastest 6 weeks that I have ever experienced but the most productive in the development of my project GEO-AWS: Gene Expression Omnibus Analysis for the R Project for Statistical Computing. I have been exposed to some really great tidbits of lessons in open-source software development that I either wouldn’t have known about or I would have been lacking until I taught them to myself  (and possibly too late in my career and not efficiently) from sorted resources on the web. What seems like the small tips and tricks that I have been implementing from my mentor in this project will increase my knowledge and abilities at the end of the program into transferable best practices in software development for a Data Scientist. 

The first half of the GSoC program took me on an adventure in several key R packages: DT and shinyBS. One of the driving motivations from my proposal was the lack of functionality  and usability in current tools used by those involved with Bioinformatics. Incorporating the DT package (a connection to the datatables library) and the shinyBS (the Twitter Bootstrap package) took the starter template* I had from Day 1 of coding to its present day, feature rich web application in development on Github. Style isn’t everything but what I’ve learned from taking a great MOOC on Udacity by Don Norman is that understanding affordances and signifiers play a important role when developing a product. 

If you were to head over to the repo of the project and fork a copy and launch it in RStudio, here is a list of things you would be able to currently do with the GEO-AWS app:

  1. Download a GEO Accession data set
  2. Select a platform
  3. View the gene expression box plot and determine if the chosen dataset was fair to compare for the other analyses by enabling/disabling log2 data transformation
  4. Select a gene and probe
  5. Select samples / groups and view those parameters and determine the differential expression across those groups
  6. View the clinical data in both a summary format and the full table (with row selection and drop-down menu population of the sample and columns for analyses)
  7. Edit the Full clinical data table with both case sensitivity, and partial matching
  8. Select time, outcome, and x-groups for survival analyses

All of these listed application features were proposed in the first 6 weeks of the program on my proposal and were completed in time to pass the midterm evaluations! The other great thing about this project is that someone with little to no experience with R programming or genomics could perform these advances statistical analyses thus providing greater accessibility and promoting open science! Go give it a try: https://github.com/jasdumas/GEO-AWS
*A Note for transparency: This is a project that I have been working on since Summer 2014 as prep for Graduate School with my mentor. I originally conceived using the shiny package and developed the initial code and workflow up until January 2015 when I switched roles as a Project Coordinator and led an awesome group of CS Undergrads as they contributed code as part of a Senior Bioinformatics independent study project at Eastern Connecticut State University until April 2015 before the GSoC coding period began. They are listed as contributors on the web application in the About section.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s