.. link: http://dgplug.org/summertraining/2013/posts/iamsudip-planetparser-20130715-123946.html .. description: .. tags: .. date: 2013/07/15 12:39:47 .. title: iamsudip planetparser 20130715-123946 .. slug: iamsudip-planetparser-20130715-123946 ============= planetparser ============= The assignment was to display all the blog post title and author from `Planet Fedora `_ in the terminal using a virtual environment. So, for this work first we have to create a virtual environment. First created a temporary directory like *virtual* and I am already in it. Now we are creating and activating the env: :: $ virtualenv virt0 New python executable in virt0/bin/python Installing distribute.............................................................................................................................................................................................done. Installing pip...............done. $ source virt0/bin/activate Now the termianl will look like:: (virt0)sudip@sudip-mint virtual $ Now the environment is created and we are in it. We need a module named BeautifulSoup to do this job. Let us download it:: $ pip install beautifulsoup Downloading/unpacking beautifulsoup Downloading BeautifulSoup-3.2.1.tar.gz Running setup.py egg_info for package beautifulsoup Installing collected packages: beautifulsoup Running setup.py install for beautifulsoup Successfully installed beautifulsoup Cleaning up... Now setup is complete. Code ---- .. code:: python :number-lines: #!/usr/bin/env python from BeautifulSoup import BeautifulSoup import urllib2 import sys def fetch(): """ Function to fetch data from url """ #Fetching html content from Planet Fedora html_cont = urllib2.urlopen('http://planet.fedoraproject.org') data = html_cont.read() html_cont.close() return data def make_soup(food): """ Function to make the parse the html document and return the list as desired output :arg food: html data """ #Using fetched data BeautifulSoup is giving a BeautifulSoup Object as 'soup' soup = BeautifulSoup(food) #Finding all the 'div' element with attribute class='string' post_list = soup.findAll('div', attrs={'class' : 'blog-entry-title'}) author_list = soup.findAll('div', attrs={'class' : 'blog-entry-author'}) #post_list, author_list: List of required data return post_list, author_list def printem(post_name_list, author_list): """ Function to print Post titles and respectives authors """ #Initialized counter count = 0 #Finding how many list element length = len(post_name_list) #Looping both post titles and corresponding author while count < length: #Finding the text post = post_name_list[count].find('a').string by_author = author_list[count].find('a').string #Printing them count += 1 print str(count) + ': Post Title: ' + post print ' Author: ' + by_author + '\n' if __name__ == '__main__': data=fetch() post, author=make_soup(data) printem(post, author) sys.exit(0) Link to code ------------ `planetparser.py `_ How to execute code ------------------- Run the above script like:: $ ./planetparser.py or:: $ python planetparser.py Example output -------------- Here example output is given below:: 1: Post Title: Week-end hacks Author: Bastien Nocera 2: Post Title: kernel news – 15.07.2013 Author: Rares Aioanei 3: Post Title: morituri 0.2.1 “married” released Author: Thomas Vander Stichele 4: Post Title: Fedora 19 With Google-authenticator login Author: Onuralp SEZER 5: Post Title: Alistando Fedora 19 Release Party Managua Author: Neville A. Cross - YN1V 6: Post Title: How to run Pidora in QEMU Author: Ruth Suehle