The assignment was to display all the blog post title and author from Planet Fedora in the terminal using a virtual environment.
So, for this work first we have to create a virtual environment. First created a temporary directory like virtual and I am already in it. Now we are creating and activating the env:
$ virtualenv virt0 New python executable in virt0/bin/python Installing distribute.............................................................................................................................................................................................done. Installing pip...............done. $ source virt0/bin/activate
Now the termianl will look like:
(virt0)sudip@sudip-mint virtual $
Now the environment is created and we are in it. We need a module named BeautifulSoup to do this job. Let us download it:
$ pip install beautifulsoup Downloading/unpacking beautifulsoup Downloading BeautifulSoup-3.2.1.tar.gz Running setup.py egg_info for package beautifulsoup Installing collected packages: beautifulsoup Running setup.py install for beautifulsoup Successfully installed beautifulsoup Cleaning up...
Now setup is complete.
1 #!/usr/bin/env python 2 from BeautifulSoup import BeautifulSoup 3 import urllib2 4 import sys 5 6 def fetch(): 7 """ 8 Function to fetch data from url 9 """ 10 11 #Fetching html content from Planet Fedora 12 html_cont = urllib2.urlopen('http://planet.fedoraproject.org') 13 data = html_cont.read() 14 html_cont.close() 15 return data 16 17 def make_soup(food): 18 """ 19 Function to make the parse the html document and return the list as desired output 20 21 :arg food: html data 22 """ 23 24 #Using fetched data BeautifulSoup is giving a BeautifulSoup Object as 'soup' 25 soup = BeautifulSoup(food) 26 27 #Finding all the 'div' element with attribute class='string' 28 post_list = soup.findAll('div', attrs={'class' : 'blog-entry-title'}) 29 author_list = soup.findAll('div', attrs={'class' : 'blog-entry-author'}) 30 31 #post_list, author_list: List of required data 32 return post_list, author_list 33 34 def printem(post_name_list, author_list): 35 """ 36 Function to print Post titles and respectives authors 37 """ 38 39 #Initialized counter 40 count = 0 41 42 #Finding how many list element 43 length = len(post_name_list) 44 45 #Looping both post titles and corresponding author 46 while count < length: 47 48 #Finding the text 49 post = post_name_list[count].find('a').string 50 by_author = author_list[count].find('a').string 51 52 #Printing them 53 count += 1 54 print str(count) + ': Post Title: ' + post 55 print ' Author: ' + by_author + '\n' 56 57 if __name__ == '__main__': 58 data=fetch() 59 post, author=make_soup(data) 60 printem(post, author) 61 sys.exit(0)
Run the above script like:
$ ./planetparser.py
or:
$ python planetparser.py
Here example output is given below:
1: Post Title: Week-end hacks Author: Bastien Nocera 2: Post Title: kernel news – 15.07.2013 Author: Rares Aioanei 3: Post Title: morituri 0.2.1 “married” released Author: Thomas Vander Stichele 4: Post Title: Fedora 19 With Google-authenticator login Author: Onuralp SEZER 5: Post Title: Alistando Fedora 19 Release Party Managua Author: Neville A. Cross - YN1V 6: Post Title: How to run Pidora in QEMU Author: Ruth Suehle