TPR Biotech Brief: Percent Identity

May 4th, 2022

Percent Identity in Sequence Searching

In this issue of the TPR Biotech Brief, we look at ‘percent identity’ in sequence searching by exploring the following:

What is meant by ‘percent identity’?
Why can every alignment have more than one type of ‘percent identity’?
What are ‘subject percent identity’ and ‘query percent identity’?
Why is looking at more than ‘alignment percent identity’ helpful?

Understanding percent identity

Let’s get started with the basics; what is percent identity?

Percent identity is a key parameter in evaluating the results of any sequence search. It is a measure of how closely a subject sequence matches your own/your client’s sequence. In brief, the higher the percent identity, the better match a subject sequence is.

For a closer look, we’ll look at an example. Figure 1 below shows an alignment of a 60 amino acid query and a 645 amino acid subject sequence. The alignment length is 57 amino acids, with one mismatch. Percent Identity is calculated accordingly.

Figure 1: Sequence Alignment

Alignment Length = 57aa

Mismatches = 1
Matches = 56
Query Length = 60
Subject Length = 645

%ID Alignment 56/57 (98%)
%ID subject = 56/645 (9%)
%ID query = 56/60 (93%)

As can be seen, there are numerous ways to calculate percent identity and each calculation reveals something different about your results. Many databases, such as the National Library of Medicine’s NCBI database, assign a percent identity using the following equation:

%identity = (# of matches)/(alignment length)

This calculation, also known as alignment percent identity, tells you how well your sequence matches the subject sequence over the frame in which they have been aligned. In Example 1 below the % identity calculated in this manner is 100%, but as you can see the query is actually much smaller than the subject.

Example 1:

What are ‘subject’ percent identity and ‘query’ percent identity?

Other databases also calculate subject percent identity and query percent identity. These two parameters are calculated as follows:

Subject %ID= (# Matches)/(Subject Length)

Query %ID = (# Matches)/(Query Length)

These two parameters help you to build a better picture of how well your query matches with the entire length of the subject or vice versa. In some cases, one or both of these % identities will be equivalent to the alignment identity, but not always.

Why looking at more than alignment percent identity is helpful

In the case of Example 1, the query percent identity will equal the alignment identity, but the subject percent identity will be lower. By comparison, in Example 2 below the subject % identity and alignment identity are equal, but the query percent identity is lower.

Example 2:

Taking the above into consideration, depending on what type of sequence you are looking for, it may be worth reviewing more than just the alignment identity, as these parameters will give you more context with regards to each sequence in its entirety.

Final Considerations

When it comes to sequence searching, all these parameters should be understood and considered by your search partner. Sequence searching is an art that takes many years of experience to master. At TPR, our Biotech Search Experts have extensive experience having worked as professional searchers within top R&D corporations at the forefront of biotechnology. Deeply rooted in the science and having many years of patent search experience enables our team to perform more robust searches and provide meaningful data insights, leading to better outcomes for our clients.

If you would like to learn more on this topic or other topics relating to sequence searching, please contact the TPR search team at 1.858.592.9084 or searches@TPRinternational.com.

All of the parameters explained in this brief and more are included as a standard with most TPR sequence search reports. If you have any specific request with regard to sequence searches in general or what information you like to see in your report, please ask the TPR search team, as we are here to help and share our knowledge.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Insights

TPR Biotech Brief: Percent Identity

Need Expert Help On a Search?

TPR. The Most Trusted Name in Searching.^SM

Alternatively, email searches@TPRinternational.com or call 858.592.9084 to speak directly with a TPR Search Specialist.

Insights

TPR Biotech Brief: Percent Identity

Need Expert Help On a Search?

Footer

TPR. The Most Trusted Name in Searching.SM

TPR. The Most Trusted Name in Searching.^SM