Answer to Question #266272 in Python for sindhu

Question #266272

Write a PYTHON script

Store https://www.genecards.org/cgi-bin/cardlisttxt.pl it in a flat file.

The GeneCards database currently contains 270,168 GeneCards

Parse the first 10 genes from each series (1A9N_Q-ZZZ3) https://www.genecards.org/cgi-bin/carddisp.pl?gene=GENE NAME

If the genes are less than 10 then parse all.

Extract Genomic Locations for GENE NAME and store it in a file for each gene you parse.For example

Open https://www.genecards.org/cgi-bin/carddisp.pl?gene=A1BG

Do scraping for “Genomic Locations for A1BG Gene”, you will see

Genomic Locations for A1BG Gene

chr19:58,345,178-58,353,492(GRCh38/hg38)

Size:8,315 bases

Orientation:Minus strand

Store the scrapped output into a file and rendered it in HTML as it looks in genecard

Expert's answer

SOLUTION TO THE ABOVE QUESTION

SOLUTION CODE

import requests
import html
#define a function to get the gene_card request
def gene_card_request():
    #our url is https://www.genecards.org/cgi-bin/cardlisttxt.pl
    url_to_request = 'https://www.genecards.org/cgi-bin/cardlisttxt.pl?gene='
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    r = requests.get(url_to_request, headers=headers)
    gene_card_html = html.unescape(r.text)
    return gene_card_html


print(gene_card_request())

Learn more about our help with Assignments: Python

Comments

No comments. Be the first!

Answer to Question #266272 in Python for sindhu

Comments

Leave a comment

Related Questions