litLIB: Twitter spreadsheet example In this example I will demonstrate some of the string and web litLIB functions and build a twitter spreadsheet in which you can view the tweets from some famous people.
I will demonstrate how a web page can be called with parameters and how the returned HTML can be searched for useful information. Please see the end of this topic for a print of the prepared (and attached) spreadsheet.
Twitter, what is it? (
www.twitter.com)
Twitter is a system where people let others know what they are doing. Useful? No? Would you like to know what Barack Obama is doing? or Jami Oliver? or .... LITSoftware (that's us!) It is amazing how many people twitter these days. To see what famous people are twittering try the following twitter pages:
Barack Obama:
http://twitter.com/BarackObama Jami Oliver:
http://twitter.com/jamie_oliver Al Gore:
http://twitter.com/algore In this topic I will demonstrate how to get the tweets from these people into a spreadsheet. This technique can be applied to all kinds of different situations including the reading of RSS feeds, the use of web services...
To start we need litLIB (
www.oraxcel.com/project/litlib) the Excel power functions pack. You can download it from here:
http://www.oraxcel.com/downloads/index.htm?product=litlib Step 1: Retrieving the twitter webpage: The simpel formula to retrieve the twitter page is as follows:
=WebPage("twitter.com/jamie_oliver")
To make it interactive, we want to choose from a list of people, we change it such that the name of the twitter account is read from a cell:
=WebPage("twitter.com/" & F3)
I have setup a list in the same spreadsheet at T3:T7 with the names of the twitter accounts I am interested in. I have created in F7 a dropdown list using the data validation feature of Excel. Select Data - Validation and select:
Allow=List, Source: =$T$3:$T$8
When you select cell F7 a dropdown list appears with all the names stored in the list. When you select a name, Excel will cause the WebPage function to be recalculated and litLIB fetches the page for the seleted twitter account.
Step 2: Extracting useful information from the web page: So far the twitter page is displayed only in source in one cell. We want to see the individual tweets. Inspecting the HTML it appears that all tweets start with: <span class="entry-content">. So the first task is to retrieve all instances of this text. One per row. I started in B5 with the formula:
=InStr(1,F$26,"<span class=""entry-content"">")
The InStr function is used to find the substring in the HTML string starting at position 1. In F26 the WebPage displays the page HTML. Note that I have doubled the double quotes. The next row needs to continue the search from where the first tweet was found: =InStr(B5+1,F$26,"<span class=""entry-content"">")
To extract the tweet we need to know where the tweet information ends. After inspecting the HTML we see that it ends with </span>. So a search needs to be done to find these strings. In C5 I coded: =InStr(B5,F$26,"</span>") and in C6: =InStr(B6,F$26,"</span>"), etc.
The spreadsheet now displays the positions where these strings could be found. Now the task is to extract the characters in between these two positions. In column E I will collect them, coding in E5: =MID(F$26,B5+28,C5-B5-28) and similarly in E6: =MID(F$26,B6+28,C6-B6-28), etc. I have used the Excel function MID to extract a substring starting at position B5 and 26 characters further since this is how long our search string was.
Extracting the tweet text concludes our example. In a similar way the tweet's timestamp can be extracted. The helper columns with the text positions can be hidden using Format - Row/Columns - Hide
Please find attached the spreadsheet I prepared.
And enjoy!