Bioinformatics

Bioinformatics is a scientific field that analyzes biological data. How is raw data processed? What do FASTQ, BAM, and VCF formats mean?

What is Bioinformatics? What Does It Do?

 

Bioinformatics is a term formed by combining the words "bio" and "informatics." In essence, it is a scientific field that processes data generated in life sciences or biology. About 30 years ago, during the Human Genome Project in the 1990s, a large volume of DNA sequencing data started to be produced. At that time, various tools and a new field of science were needed to analyze the data obtained. In response to this need, bioinformatics developed and, in subsequent years, has become an important scientific discipline.

Today, bioinformatics encompasses methods used to analyze data generated in various fields. However, the most commonly produced data is DNA sequence data, especially from NGS (Next Generation Sequencing). A large portion of bioinformatics work is focused on this area.

 

What is Raw Data? What Type of Data is Obtained Technically?

 

'Raw Data' refers to unprocessed biological data. Modern devices produce a vast amount of data, but in its raw form, it can appear large and meaningless. Using bioinformatics tools, raw data is processed into meaningful results.

Different data formats exist for different applications in biology. For example, microarray studies produce data in formats different from DNA sequence analysis. However, today, the largest volume of data comes from DNA sequence analysis.

The data generated in DNA sequence analysis is produced in common file formats, regardless of the device or platform used. The most commonly used file formats are FASTQ, BAM, and VCF files. The FASTQ format contains raw DNA sequencing data. BAM is a file format that contains aligned data, meaning data that has been mapped onto a reference genome. VCF is a file format that shows the genetic variations identified by comparing the aligned data to a reference genome. These three file formats are widely used in bioinformatics analyses.

 

What Types of Raw Data Services Does Intergen Provide?

 

At Intergen, we provide raw data services at every stage of DNA sequence analysis studies. We offer raw data services for NGS (Next Generation Sequencing) data, as well as for Sanger DNA sequencing technology.

These services can be delivered at various stages. Some clients bring samples to us and request that we handle the entire workflow from start to finish. Other clients may handle part of the analysis process themselves and bring the samples to us at a specific stage for us to complete the remaining steps.

In summary, we can produce and deliver raw data at every stage, tailored to customer requests, for both NGS and Sanger sequencing methods.

 

What Data is Obtained in Exome and Genome Studies?

 

Exome and genome studies are large-scale analyses, so the resulting data size is substantial. The data obtained may be in the form of raw data files (FASTQ), aligned genome files (BAM), or summary files containing genetic variations (VCF).

Bioinformatics analysis processes can be classified into three stages. The primary analysis generally includes the analyses performed by the data-generating devices themselves. This analysis results in the creation of FASTQ files. In the secondary analysis, FASTQ files are aligned to a reference genome to identify variants, and VCF files are created. The tertiary analysis involves integrating the VCF files with clinical information to generate insights that can explain an individual’s clinical condition, determine the prognosis of a disease, or reveal the person’s genetic risks.

As a result of these processes, genetic data becomes clinically interpretable, providing important insights into an individual’s health status.