Question

Access any flatfile from NCBI (The NCBI home page is
http://www.ncbi.nlm.nih.gov ). Decode every information given in the
accessed file

•       What is the first line indicating

•       What is the nature of the sequence

•       Identify the version

•       Is the data you have accessed is coding sequences or
open reading frame? Which is the start and stop codon?

•       Has it got untranslated regions?

•       Has it been linked to the protein database? If
connected, how many amino acids? What is the accession number?

•       Is the information published?

Accepted Answer

The NCBI Reference Sequence (RefSeq) project provides sequence records
and related information for numerous organisms, and provides a
baseline for medical, functional, and comparative studies

The distinct accession number format, which begins with two characters
followed by an underscore (e.g., NP_), is the most distinguishing
feature of a RefSeq record. An underscore is never included in an
INSDC accession number.

NCBI creates and updates RefSeq records from sequence data available
through the INSDC.

While this is frequently true for genes with very limited sequence
data, reference sequence records are not intended to reflect the
historical first sequenced record of a gene. Until the RefSeq record
is completely checked, PROVISIONAL records may be automatically
revised to use a longer INSDC source nucleotide sequence that becomes
accessible.

In the COMMENT field of the flat file record, all INSDC submissions
used to create a RefSeq are specified.

The GPX3 gene (GeneID 2878) produces a protein that contains
selenocysteine as an amino acid. The codon ‘tga, which is commonly
read as a stop codon, encodes selenocysteine. The COMMENT block of NM
002084.3 displays the RefSeq Attribute ‘protein contains
selenocysteine, and the translation exception qualifier on the CDS
function specifies the position of the stop codon that encodes
selenocysteine (which appears as a ‘U in the amino acid sequence).
The position of the selenocysteine codon or amino acid residue is also
annotated as a misc feature or Site feature for transcripts and
proteins in the Bilateria community.

Refseq information is maintained, curated and published and is in the
public domain.

Question #177973

Expert's answer