Interactive data mining and visulization tools in sequencing services, 3 months, Full time in Summer 2015 based near Essex.
Project Title: Interactive data mining and visualization tools in sequencing services
Supervisor: Efi Athieniti
Advances in next-generation sequencing technologies and decreasing costs have led to the rapid expansion of sequencing labs and an increasingly high throughput of genomic data. A major challenge in sequencing laboratories is the quality control of DNA data at different stages of the sequencing process.
While automation is the ultimate goal of the most standardized lab processes, automated QC is a bigger challenge because of the rapid improvements in sample preparation and sequencing methods but also the multidimensional input to such processes in a lab like material quality, machine quality and human input. Quality can be improved by constantly monitoring the laboratory test data and analyze them in a scientific manner through real-time data mining and visualization tools so that the right decisions can be taken.
These tools are essential for managers, bioinformaticians and lab users to allow them to take these decisions on the basis of analyzed data in the right time. The tools will allow the users to look at data from many different dimensions or angles, categorize it, and summarize the relationships identified, by bringing together information from every stage of the sequencing workflow. Most importantly these tools should be easy to use and interpret not only by bioinformaticians and data analysts, but managers as well as lab personnel who are the most involved in the these processes.
The purpose of this internship is to allow the student in Computer science or software engineering to develop such data mining and visualization tools with rich interactive graphics and charts. The student is invited to explore the type of data to be used in such a tool, eg. genomic data, metadata and lab procedure details but also scientific visualization techniques to extract and display this information.
This is an opportunity to work on production genomic analysis and workflow management software in a collaborative environment of bioinformaticians, analysts, software engineers and lab scientists.
The successful candidate will have a strong computer science background with experience in Python and web development with HTML/JavaScript. API development experience is preferred and understanding of relational databases and SQL is useful. Nice to have qualifications include any previous experience with data processing, analysis and data modelling. Strong logical thinking, problem solving skills, a creative mind and strong communication skills are important.
When applying to this placement please remember it is based in Essex. (Little Chesterford,10 miles South of Cambridge)
Apply online now to start your e-Placement Scotland journey with Illumina.