iamsudip planetparser 20130715-123946

Posted: 2013-07-15 12:39

The assignment was to display all the blog post title and author from Planet Fedora in the terminal using a virtual environment.

So, for this work first we have to create a virtual environment. First created a temporary directory like virtual and I am already in it. Now we are creating and activating the env:

$ virtualenv virt0
New python executable in virt0/bin/python
Installing distribute.............................................................................................................................................................................................done.
Installing pip...............done.
$ source virt0/bin/activate

Now the termianl will look like:

(virt0)sudip@sudip-mint virtual $

Now the environment is created and we are in it. We need a module named BeautifulSoup to do this job. Let us download it:

$ pip install beautifulsoup
Downloading/unpacking beautifulsoup
  Downloading BeautifulSoup-3.2.1.tar.gz
  Running setup.py egg_info for package beautifulsoup

Installing collected packages: beautifulsoup
  Running setup.py install for beautifulsoup

Successfully installed beautifulsoup
Cleaning up...

Now setup is complete.

Code

 1 #!/usr/bin/env python
 2 from BeautifulSoup import BeautifulSoup
 3 import urllib2
 4 import sys
 5 
 6 def fetch():
 7     """
 8     Function to fetch data from url
 9     """
10 
11     #Fetching html content from Planet Fedora
12     html_cont = urllib2.urlopen('http://planet.fedoraproject.org')
13     data = html_cont.read()
14     html_cont.close()
15     return data
16 
17 def make_soup(food):
18     """
19     Function to make the parse the html document and return the list as desired output
20 
21     :arg food: html data
22     """
23 
24     #Using fetched data BeautifulSoup is giving a BeautifulSoup Object as 'soup'
25     soup = BeautifulSoup(food)
26 
27     #Finding all the 'div' element with attribute class='string'
28     post_list = soup.findAll('div', attrs={'class' : 'blog-entry-title'})
29     author_list = soup.findAll('div', attrs={'class' : 'blog-entry-author'})
30 
31     #post_list, author_list: List of required data
32     return post_list, author_list
33 
34 def printem(post_name_list, author_list):
35     """
36     Function to print Post titles and respectives authors
37     """
38 
39     #Initialized counter
40     count = 0
41 
42     #Finding how many list element
43     length = len(post_name_list)
44 
45     #Looping both post titles and corresponding author
46     while count < length:
47 
48         #Finding the text
49         post = post_name_list[count].find('a').string
50         by_author = author_list[count].find('a').string
51 
52         #Printing them
53         count += 1
54         print str(count) + ': Post Title: ' + post
55         print '        Author: ' + by_author + '\n'
56 
57 if __name__ == '__main__':
58     data=fetch()
59     post, author=make_soup(data)
60     printem(post, author)
61     sys.exit(0)

Link to code

planetparser.py

How to execute code

Run the above script like:

$ ./planetparser.py

or:

$ python planetparser.py

Example output

Here example output is given below:

1: Post Title: Week-end hacks
    Author: Bastien Nocera

2: Post Title: kernel news – 15.07.2013
    Author: Rares Aioanei

3: Post Title: morituri 0.2.1 “married” released
    Author: Thomas Vander Stichele

4: Post Title: Fedora 19 With Google-authenticator login
    Author: Onuralp SEZER

5: Post Title: Alistando Fedora 19 Release Party Managua
    Author: Neville A. Cross - YN1V

6: Post Title: How to run Pidora in QEMU
    Author: Ruth Suehle