How to Scrap using Beautiful Soup and Requests in Python
Instagram Scraping using Beautifulsoup and Requests in Python
In the previous article Web Scraping using Python part-1, we had a brief discussion on Web scraping.
Here, we’ll apply the same tools and process to any static website out there on the world wide web. Through this article, we will start building a scraper to fetch the user's Instagram data like title and number of followers, followings, and posts of any Instagram user.
Before starting the project, I would suggest reading my article on web scraping using python |Part-1 for a better understanding of scraping in python. So, let’s see how we can grab the user's Instagram data using python. Before starting web scraping of any site, you must understand the structure of the site that we want to scrape. So first explore the website and then choose any of the static pages. The steps that you need to follow for scraping any web site using python are as follows
Step1: Importing the libraries
Step2: Performing HTTP request
Step3: Parsing the HTML content to the BeautifulSoup and store it into python object.
Now, let’s see the code to grab the user's Instagram data. For this we have created a function called ‘InstaScrap’, here all the three steps mentioned above are followed to grab the user Instagram details using the “Find” method. when the main function calls this function, It returns the text content wrapped under the title tag, number of followers, followings, and posts.
Code:
#importing the libraries
import requests
from bs4 import BeautifulSoup
#creating function to scrap information
def InstaScrap(user):
    #url.format(urlObj) is the built in API provided by uRL class,which takes an object or string and return a formatted string
    #derived from that object or string and if urlObj not found then it throws an error.
    url = "https://www.instagram.com/{}".format(user)
    # creating 'r' as an object that will hold the recieved data from http request.
    r = requests.get(url)
    #Code for Parsing the r.text to the BeautifulSoup and storing it into python object "bs".
    bs = BeautifulSoup(r.text ,"html.parser")
    #holding the title of bs
    title=bs.title
    #use of 'FIND' method to get the details.
    rep = bs.find('meta',property ='og:description').attrs["content"]
    #print the text content of 'title' element. 
    print(title.text)
    #returning parse(rep)
    return rep
# Main method 
if __name__ == "__main__":
    #getting input from the user   
    user=input("username: \t")
    #print the scraped details 
    print(InstaScrap(user))
    Output:
Now let's improve the above code by defining one more function for information like followers, followings, and posts.
Here, we used split() to return a list of strings after breaking the given string(here it is ‘data’) by specified separators. Then created a dictionary to hold the details like followers, followings, and post. Let’s see the code and the output.
Code:
#importing the libraries
import requests
from bs4 import BeautifulSoup
#create function forinfo like followings and followers
def parse(data):
    res=data.split('-')[0]
    re=res.split(" ")
    
    #Creating dictionary to hold the following details.
    info={}
    info['followers']=re[0]
    info['followings']=re[2]
    info['post']=re[4]
    return info    
#creating function to scrap information
def InstaScrap(user):
    #url.format(urlObj) is the built in API provided by uRL class,which takes an object 
    #or string and return a formatted string
    #derived from that object or string and if urlObj not found then it throws an error.
    
    url = "https://www.instagram.com/{}".format(user)
    # creating 'r' as an object that will hold the recieved data from http request.
    r = requests.get(url)
    
    #Code for Parsing the r.text to the BeautifulSoup and storing it into python object 
    "bs".
    
    bs = BeautifulSoup(r.text ,"html.parser")
    #holding the title of bs
    title=bs.title
    #use of 'FIND' method to get the details.
    rep = bs.find('meta',property ='og:description').attrs["content"]
    #print the text content of 'title' element. 
    print(title.text)
    #returning parse(rep)
    return parse(rep)
# Main method 
if __name__ == "__main__":
    #getting input from user    
    user=input("username: \t")
    #print the scraped details
    print(InstaScrap(user))
    Output: The above code returns the text content which was wrapped under the title tag, and a dictionary with keys as followers, following posts, and values as the number of followers, followings, and posts.
You can scrape any website using these python packages If you know about Html codings and its tags you can get all the information related to the websites. Hope you like this article, for more interesting articles on python stay tuned with us. If you liked it then don’t forget to share it with your friends.
Till now, Happy Learning🙂🙂
 
    

 
                    
