Web Scraping: Extracting News Article Using Python

Web scraping is a technique to extract data from webpages. It also commonly termed as web data extraction or web harvesting. This technique can be utilised to extract different types of information like textual data, tables or links from any website.

Applications of Web Scraping
  1. Textual Analytics: Data is extracted from the websites to perform various textual analytics techniques and come up with relevant information. 
  2. Keywords: Used for the purpose of SEO and website analytics by extracting the relevant keywords used by a website. 
  3. Recruitment and selection: Used by HR department to scrape social media profiles and gather data which is important for the purpose of recruitment and selection. 
  4. Research: Academicians and students use web scraping for research purposes. 
  5. Lead Generation: Real estate companies scrape user data from various sites as a lead generation process to target them. 

On Planet Analytics we will learn how to perform Web Scraping using python

We will extract a news article from the website The MInt

# Import required libraries
>import requests
>from bs4 import BeautifulSoup

>url = 'http://www.livemint.com/Companies/gIzjSNuBykoPFmPIVopk5J/The-secret-sauce-behind-a-successful-Indian-startup.html'
>html = requests.get(url)

>bso = BeautifulSoup(html.text, 'lxml')
>bso.find('title').getText() #Just to scrape the title

>plist = bso.find_all('p') #Tag p has all the content

>textdata = [] 
>for p in plist: abc = p.getText()
>textdata.append((abc)) #Putting the textual data into a string