Updates: My quest to become a Data Scientist


Since my last post things are progressing well in my journey. I have completed the Coursera.org course in Reproducible Research from the Johns Hopkins University Specialization. That leaves just four more courses to complete. My plan is to take a break in October and resume coursework in November. I’m really learning a lot of valuable skills (R Programming, getting data from APIs, finding free datasets on the web, & data visualization) from these MOOCs and finding networking opportunities in the Linkedin group. The 5th course in the series was probably the easiest since its stressed what I saw as common sense (from a scientific method point of view), but if not implemented could be the difference in “good” research and research nobody cares about because they can’t disprove / improve / reproduce your methodology.

My time during October will be concentrated in completing (or at least partially) my Open Source Pre-Graduate research Project. The project is still a work in-progress and my hopes is that it will be completed within the next few months (standard semester length project). The project is an adaptation of the NIH GEO2R, gene expression analyzer. Its been challenging mixing my introductory bioinformatics knowledge with my introductory R programming language skills. I’m encountering some milestones / hang-ups in my progress but I’m attacking each portion of the project week-by-week.

My To-Do List so far:

  1. Understand / learn R Programming
  2. Understand the purpose and usage of the NIH GEO2R tool
  3. Conceptualize / Research a web application tool to mediate the not-so user-friendly features of the GEO2R tool
  4. Understand / learn Shiny applications framework
    1. Develop a clean user interface that incorporates multiple views of a plot, expression data, and series info
    2. Develop reactive programming for users to input the GEO, gene expression numbers to evaluate
    3. Develop a method of selecting individual rows and then adding them to a group to preform comparative analysis
    4. Provide an output plot of Clinical Data (Male vs Female expression; Kaplan – Meier Chart of mortality etc.) This part isn’t finalized yet
  5. Develop a layout that encompassed all of these features.
  6. Launch the application via Shiny.io apps or on the Bioinformatics Page at Eastern Connecticut State University

It’s truly exciting that I have completed so much and I can’t wait to share this project with everyone. My research meetings are occurring more frequently and I’m seeing my comfort level in R programming rise. It should make for strong feature on my application to graduate schools in addition to my professional development in Data Science.

GEO2R Clinical data analyzer picture
sample screenshot of my research project

One thought on “Updates: My quest to become a Data Scientist”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s