My Summer as a Data Science Intern at PCCI

For the short duration of returning to my hometown Dallas for the summer, I’ve been interning at Parkland Center for Clinical Innovation (PCCI) as a Data Science Intern. During my interview with Albert and Vikas, we discussed some issues with the representation of data in the current healthcare system. Hospitals use different coding systems in their electronic medical records (EMRs), making communication between hospitals and care providers difficult. A while ago, a new health data standard called FHIR (Fast Healthcare Interoperability Resource, pronounced “fire”) was proposed. My project this summer aimed at identifying whether data could be easily transformed into the new FHIR format, carrying out the transformation, and creating predictive models using the new FHIR data.

Situated on the 11th floor of the building, PCCI is a very chill place to work. Quiet spaces are easily found at desks and conference rooms scattered around the office. As an intern, I sit on the “Intern Island” with (usually) 6 other interns. I like this space because we get two monitors and a Lenovo Thinkpad.

 

Emily Wang, PCCI Data Science Intern
Emily Wang, PCCI Data Science Intern

As for work, each PCCI project usually consists of one project manager, a clinical expert, and a data scientist. The intern projects are no different; Aaron was the FHIR Project Manager Intern, and Mila was the FHIR Clinical Intern. Both had important but separate duties that helped our project succeed.

As the Data Science Intern on the FHIR project, I was responsible for first converting the data into FHIR resources.  This involved bringing back Java knowledge from several years ago! There were definitely some issues figuring out how to add the right dependencies because Java can get complicated very quickly. A few days were spent just trying to get oriented with Java and Eclipse, and making sure all the necessary packages for FHIR were installed.

We were working with two years of data. This roughly translates into 27 million (!) vitals and 17 million labs, and each vital and lab was converted into its own separate file. I quickly realized that there would be no space on my laptop to hold all of these files, so we decided to enlist the help of Microsoft Azure. With Azure, the task became less difficult, but still, the hardest part of my summer was working with such huge numbers of files.

Caught up in the huge task of transforming vast amounts of data to FHIR resources, I left very little time in my internship to work on actual data science. Out of the approximately 13 weeks total, about six weeks were spent converting the table format EMR data into FHIR resources, five weeks were spent on parsing the FHIR resources into a format for machine learning, and the remaining two weeks were dedicated to model building. Reflecting back, I would definitely work harder to cut short the resource conversion in favor of more time for data science.

 

"As an intern, I sit on the “Intern Island” with (usually) 6 other interns. I like this space because we get two monitors and a Lenovo Thinkpad." said Emily Wang
“As an intern, I sit on the “Intern Island” with (usually) 6 other interns. I like this space because we get two monitors and a Lenovo Thinkpad.” – Emily Wang

As a Data Science Intern at PCCI, you have the freedom to work in any language you want; the full-time Data Science team is very evenly divided between R and Python. There’s also a lot of freedom in dictating which path your project will go. Your supervisor will point you in a very general direction of where to go and state goals and expectations, but is otherwise very lenient!

Don’t be shy about asking around people for advice and help, even if they’re not on your project team! Even though most people are busy with various meetings, they will gladly schedule a 30-minute or even hour-long block to discuss your project privately with you.

When presenting your project, whether it’s a progress update or final presentation, expect multiple questions from the audience. It’s not that they want to quiz you on your knowledge and preparation on your project, but because they’re genuinely curious and care about understanding what you’re doing over the summer.

A mandatory 30-minute lunch is required every day. I recommended bringing lunches that can stay in the fridge for several days (like salad) or not bringing anything because there are often team lunches and random outings during the day. Occasionally there’s leftover pizza or sandwiches from lunch meetings in the big conference room or leftover burritos from breakfast.

I enjoy the diverse atmosphere at PCCI the most. The three teams: Data Science, Project Management, and Clinical teams collaborate and work together so well. It’s a very fluid system. A data scientist with a question about the best intervention methods for patients with diabetes can easily walk over to a clinical team member and get an answer within minutes. Despite being employed as a data scientist, you have access to an entire host of medical knowledge from the clinical team and connections from the project management team.

My biggest takeaway from this internship is learning about long-term time management and collaboration. Manage your time well and you’ll be able to at least touch on everything you wanted to learn during your internship. Collaborate with as many people as you can, so not only can you learn so much more but also gain friends and connections while doing so.