.. link: http://dgplug.org/summertraining/2013/posts/supriyasaha-planetparser-20130715-155758.html
.. description: 
.. tags: 
.. date: 2013/07/15 15:57:59
.. title: supriyasaha planetparser 20130715-155758
.. slug: supriyasaha-planetparser-20130715-155758

already installed virtual environment

Now to work in a virtual environment we use

step
----

[supriya@localhost] ~ $  cd virtual

virtual $  source virt1/bin/activate

(virt1) $  vim b.py

(virt1)$  

The code is about printing the title and the author name of each blog post from the blog site 'http://planet.fedoraproject.org'


now if we run the file using ./b.py we output

(virt1)$ ./b.py

output:
--------

title: CatN | CentOS Dojo 
author: Richard W.M. Jones

title: Daily log July 11th 2013 
author: Dave Jones


Code of the program
-------------------

.. code:: python

 #!/usr/bin/env python
 from bs4 import BeautifulSoup
 import requests
 import urllib2

 url = 'http://planet.fedoraproject.org'

 html_doc = urllib2.urlopen(url) #extract the html document frm the website
 data=html_doc.read() #reads the file

 soup = BeautifulSoup(data) #parse the data

 title = soup.findAll('div', attrs={'class' : 'blog-entry-title'}) #this extracts the title of each blog post with attribut class='blog-entry-title' and tag 'div'
 author = soup.findAll('div', attrs={'class' : 'blog-entry-author'})#this extracts the author name of each blog post with attribute class='blog-entry-author' and tag='d iv'

 length=len(author) #to get the total number of post in the blog

 for x in range(length):

    print "title: %s " % title[x].find('a').string #to print the title of each post
    print "author: %s" % author[x].find('a').string #to print the name of each  post

 html_doc.close()


the link to the code is: https://github.com/supriyasaha/hometaskrepo/blob/master/planetparser/planetparser.py