Saturday, November 10, 2012

Mining Tweets of World Cup T20 Match between India and Pakistan: Interesting Insights from Social Network Analysis

I have been quite absent from this space I call my blog for quite sometime now and this is not without reason. The past few months have been extremely busy with lots of traveling (Milan, Venice, Rome, Nijmegen and Copenhagen all in three-four months' time),  and of course the never-ending paper submissions. As I had explained in my previous post on online education initiatives I am also taking the Computer Science online courses on Coursera and this semester I happened to take up a very interesting course by Lada Adamic of University of Michigan (Social Network Analysis). Though I have myself taught some aspects of Social Network Analysis during a summer course at Faculty of Computer Science at IBA, Karachi but despite that I found this course intriguing and the way Lada enriched it with cool applications of SNA was simply amazing.

As an optional part of this course the students were to submit a programming project and I thought what better opportunity than this to submit a part of the TweetCric project being undertaken by our research group. It's always good to get some early feedback on your work in order to gain useful, innovative directions and hence, I decided to blog about my Social Network Analysis project. Readers are welcome to suggest any new directions or give their feedback in comments so as to help us in this project. Following is a description of the project for interested readers of my blog:

Social media applications have considerably influenced the lives of millions and everyday there is a huge amount of updates to various social networks such as Facebook and Twitter. As of March 2012, more than 400 million tweets were being posted on Twitter each day. The volume of tweets becomes significantly high during a sporting event as many sports fans now use social media as a part of their viewing experience. Users describe this as an experience full of pleasure and fun as described in following Facebook status update during the recent World Cup T20 match between India and Pakistan:


"Facebook comments are more interesting than the match. Already more than two pages of comments. Looks like PakInd Vs Facebook"

Interestingly, the huge amount of content produced during sporting events can be used for analysis of players' performance and in light of that sports managers can decide future sports strategies and hence the notion of crowd-sourced sports critics can be realized in practice. Researchers have already begun to explore the possibility of using this huge volume of user-generated content to solve various research issues such as event detection, video annotations for sports summaries etc. [1, 2]. We argue in this work to utilize this huge crowd-sourced content for the usefulness of sports strategy analysts and decision-makers. In this work, we use social network analysis to highlight significant players during the match along with an analysis of the reasons of why social network analysis methods detect these players.

Social Network Modeling
The data was obtained using the Twitter Search API. During the epic match held on September 30th 2012, we gathered tweets for the match using the Twitter Search API. We regularly queried the Search API through a Python script on half-hour intervals thereby collecting fresh tweets as the match progressed. In total we collected a sample of 43,450 tweets during the match with hashtag PakvsInd.

We modeled the social graph of the players and commentators using the text content of the tweets. First, using Wikipedia and ESPN CricInfo as an external resource we compiled a list of players and commentators relevant to the India-Pakistan cricket match. This list was then used to detect tweets containing a mention of any player or commentator; following list shows some sample tweets

  1. hafeez goes, 15 from 28 balls.. idiot, wasted his time big time. game over. #pakvsind
  2. like the world cup of pakistani batsmen falling against yuvraj. kamran also departs edging to dhoni. pak 56/4 after 9. #pakvsind
  3. rt @maria_memon: rt @maria_memon: afridi! quit playing games with our hearts....our hearts....#pakvsind'
  4. hafeez is the reason for todays batting performance.... after nazir he put all of the team under pressure! #pakvsind
  5. is dhoni trying to piss off pakistanis by bringing in kohli? #pakvsind'
We now explain how we formulate the nodes and edges in our social network of players and commentators. Each player/commentator is treated as a node and an edge is represented between players/commentators if they co-occur in a tweet. As an example consider tweet 2 above; there would be edges between yuvraj, kamran and dhoni according to our model. In total 8,587 tweets (19.8%) contained a mention of some player or commentator.

The following figure shows the visualization that was obtained from this social network (Gephi was used for the generation of the graph)

Modeled social network of players/commentators during World Cup T20 India-Pakistan match
As clear from the Figure, there are three communities within this social network. Nodes are sized according to betweenness centrality and it can be seen that Hafeez is the node with highest betweenness: this is because this particular player was the captain of Pakistani team in that match, Pakistan lost the match due to his poor captaincy, poor fielding placements and poor batting (as per most of the tweets). The node with second-highest betweenness i.e. Kohli is the one who got man of the match and scored the highest runs leading India to a comfortable victory. Hence, it can be seen that social network analysis gives important insights into sporting events. Natural language processing as an alternative approach seems to lack the precision and efficiency that social network analysis offers. Our team has been long arguing for a hybrid approach that utilizes both natural language processing and social network analysis approaches to address the various research questions in the fields of Information Retrieval and Web Information Systems given the low scalability and speed of Natural Language Processing alone [3].

We now analyse the communities within this dataset. The community represented in blue is mostly comprised of Indian players and it makes sense as to why they form a separate community. However, the inclusion of Misbah and Ajmal in this community is weird since both are Pakistani players - further analysis reveals as to why this occurred and it was due to Ajmal taking important wicket of Sehwag causing Ajmal to go into that community and Misbah being mentioned with Ajmal once forced him there too. The community in dark green represents for the most part Pakistani players with the exception of Dhoni who is the Indian team captain; this however occurs due to Twitterers comparing Hafeez's captaincy with Dhoni's captaincy thereby forcing Dhoni in that community. Lastly, the community in aqua green represents players who did not play in the match with the exception of Afridi and he was forced into that community due to tweet suggestions from Pakistani cricket fans of dropping him and including him in the list of those not playing the match.

Lastly as I mentioned in the beginning of this post as well any feedback or idea is welcome. Interested students who want to join this project are requested to contact me personally via email or social networks.

References:
[1] J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, IUI ’12, pages 189–198, New York, NY, USA, 2012. ACM.
[2] A. Tang and S. Boring. #epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 1569–1572, New York, NY, USA, 2012. ACM.
[3] A. Younus, M. Qureshi, F. Asar, M. Azam, M. Saeed, and N. Touheed, “What do the average twitterers say: A twitter model for public opinion analysis in the face of major political events,” in 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011, pp. 618–623.

Friday, May 18, 2012

Online Education Initiatives: A Hope for Education in Less Developed Countries


I very well remember my time in the Computer Science Department of Karachi University when teachers who did not take classes really annoyed me. Other class-fellows would call me crazy on account of being so nerdy but I knew this was a valuable period of our life which would never come back. This was the age where the mind is ready to absorb all knowledge which thanks to our messed-up education system (not to forget the loads of politics that pollutes it) was literally being wasted. The dilemma of technological sciences such as Computer Science in lesser developed countries like Pakistan lies in it being more of a hype than a science. In my part of the world students flock to Computer Science to get good jobs after graduation: of course this is a necessity and the point of a good education but shall it be the only goal is the real question we should address.

Back then there was a frustrating time when the Object-Oriented Programming teacher gave us the option of either to learn OOP concepts with C# or C++, and, unfortunately most of the class went for C# due to its being in demand by the job market. At that point, I realized how tough a time Computer Science will get in Pakistan and this remains true to this day. Sadly not only students but teachers have also promoted the job-oriented study model leading to a myth that Computer Science is all about sitting on a desk writing code in .NET or PHP (or any other programming language for that matter).

Meeting the well-known scientist, Rakesh Agarwal from Microsoft Research confirmed my assertions about the pathetic state of affairs of technological sciences in countries like India and Pakistan. He shared the same dissatisfactions as me, and strongly criticized the industry in the lesser-developed countries. Equivalently sad is the state of affairs at the national universities in South Asia, and the situation is changing at a very slow pace. When Stanford announced its online courses, I saw acquaintances in my social network sharing about it and the ones most excited about these online courses were undergraduate students from institutes of my country. This as I see it is a silver lining admist the dark clouds as online education initiatives like Coursera, EdX and Udacity will now grant access to quality education to students from all over the world. This in my opinion is a huge step towards bridging the digital divide and it is now upon students in the developing world to make most of this opportunity. Today's connected society gives easy and massive access to knowledge unlike the situation I had back in my undergraduate days and I feel students today are far more blessed than students of my time.

What started as an educational initiative by accomplished Stanford Professors Daphne Koller and Andrew Ng has now turned into a global phenomenon with the best universities contributing to make knowledge open for all. If studying at world's reputed universities (Stanford University, MIT, Harvard, University of Michigan, University of Pennsylvania  etc.) was ever your dream then there can be no better time to go and get that dream. Some students might take this as an exaggerated statement but this comes from me after personally taking two online courses this semester and enjoying them to the maximum. Furthermore, Coursera statistics also confirm the value that online education has now added to universities; they could never have achieved this value as Andrew Ng puts it: "I normally teach 400 students," Ng explained, but last semester he taught 100,000 in an online course on machine learning. "To reach that many students before," he said, "I would have had to teach my normal Stanford class for 250 years."



It is a generally held notion that the academic culture and the styles of teaching in our part of the world are out-dated and boring. I can certainly confirm this assertion on account of my experience in Pakistani academic circles for quite sometime now. For the most part, higher-education circles in developing regions limit ideas to an academic document on a shelf quite unlike the way that things are done in the top research universities of the world. Students have always wanted to know how the ideas that they study in the classroom apply to the real-world problems around them. With world-class Professors offering online courses, there is an oppurtunity to get much of those questions answered.

Online education as a phenomenon is not new and for years people in less developed regions have been skeptical of them but it's quite different with Coursera and other similar initiatives. The revolutionary ideas behind these initiatives are the concept of testing, grading, student-to-student help and awarding certificates of completion of a course. Daphne Koller, a Stanford computer science professor who founded Coursera with Ng, explained in her talk at LinkedIn last week, "It will allow people who lack access to world-class learning - because of financial, geographic or time constraints — to have an opportunity to make a better life for themselves and their families."



So the next time students come to me seeking advice on how to start with research or how to apply for foreign universities I'd recommend him/her to take some courses (that relate to his/her area of interest) on Coursera or any such platform. With such initiatives coming from the world's top-class universities there is a hope for revolutionization of higher education by allowing students from all over the world to not only hear top-quality lectures, but to do homework assignments, be graded, receive a certificate for completing the course and use that to get a better job or gain admission to a better school.

Sunday, April 15, 2012

WWW2012 Poster: New Media vs. the Old Media


Today's social-media savvy age has considerably changed the paradigm of traditional journalism. Interestingly, it has also led to new debates within the journalism and media industry with supporters of social media terming it as a platform for the masses' voice while opponents terming it as gibberish and noise. Old-school journalism disregards the significance of social media popularity for any article on the pretense of “journalism is not about feeding the masses with whatever crap they want to be fed with.”

It turns out that this entire debate is not as simple as it appears to be on the outlook. What old-school journalism advocates do not take into account is the age-old phenomenon termed as “media bias” by the social sciences research community. A famous paper published in 2004 by the Department of Political Science at UCLA and the Department of Economics at University of Missouri studies the bias of famous news outlets in the US. Since then there have been various attempts at studying biases in traditional media platforms (such as New York Times, Fox News, Washington Post, CBS, Wall Street Journal) with most of these coming from the sciences (social science, political science, Computer Science). Empirical evidence is what is given utmost importance from a scientific viewpoint and unfortunately the social media circles in Pakistan tend to ignore this angle altogether. This brings into the picture a new phenomenon of bias measurement in various forms of media which turns out to be a huge research challenge within itself. The solution: yes, social media with the insights and popularity judgements can serve as a tool not just for the masses' voices but also for measurement of bias in traditional media and this is exactly what a team of researchers in IBA's Web Science group have done.

The crucial nature of the media industry makes it all the more essential to have ways and means of verification of its content. This leads to the natural question of how new media namely the social media can help measure the inevitable biases inherent in traditional media. Few of these questions have been answered by researchers from one of Karachi's most prestigious educational institute, Institute of Business Administration whereby they investigated differences between news appearing on traditional and social media platforms via publicly available data from famous microblog site Twitter. Being a part of this team made me delve deeper into various aspects of media both internationally and in Pakistan with my observation being that today's media tend to ignore the crucial role of social media and does not take into account popular demands. With this conclusion, we argue for a paradigm shift in how traditional media platforms perceive the new media landscape and the sooner they embrace this new world the better for their own survival.

Some technical details of the study warrant an explanation which is as follows. The data mining similarity metric of Jaccard Similarity has been used to investigate the differences in named entity coverage between the 16 million tweets posted during the time period of Egypt uprising (tweets' data obtained from TREC 2011 microblog track) and the New York Times articles corresponding to Egypt. The figure below shows our results:



It demonstrates a significantly low value of coverage (Jaccard Similarity being below 0.5 for all days) thereby proving the presence of media bias. Moreover, we extend this study to a local level (for Pakistani media outlets) on a daily basis for the month of November. The extension utilizes topic models (specifically standard LDA and Twitter-LDA) in order to discover similar topics in the two media followed by a ranking function which computes popularity of a news item in the two platforms. This is then compared with a manually ranked list with the final result being that the ranks obtained from social media (tweets data) match the human-annotated ranks more closely.

For those interested, here's the abstract of our paper:
It is often the case that traditional media provide coverage of a news event on the basis of journalists’ viewpoints - a problem termed in the literature as media bias. On the other hand social media have given birth to an alternative paradigm of journalism known as “citizen journalism”. We take advantage of citizen journalism to detect the bias in traditional media and propose a simple model for empirical measurement of media bias.

Note: This is part of a long-term project by the Web Science research group at Institute of Business Administration, Karachi, Pakistan and we welcome interested students to be a part of our project.

The slides for the work can be viewed here and the full 2-page poster paper can be downloaded from here.

Sunday, March 4, 2012

Three cheers for Professor Moon

Few days back when I read in my Facebook news feed an update from Professor Sue Moon that she is now tenured Professor at KAIST, I was immensely delighted. This post is a special tribute to Prof. Sue Moon from a student that did not get to spend much time with her but whatever time I spent with her played a huge role in my learning path. It all began in Spring 2009 when I took up Professor Moon´s course on Advance Networking. At first she sounded hard to impress but then I figured out it´s her way of teaching the students. The Advance Networking course she was teaching us was special, it turned out to be one of the toughest and yet greatest learning experiences of my life. She had specially designed the course keeping in mind the struggles young researchers have to face. Throughout the semester we were expected to read papers, write a critique of the paper and present some of the selected papers in class as if they were our own papers. This activity turned out to be quite hectic and each student used to dread the day when he/she had to present and one strong reason for that was Professor Moon´s fiery questions about the technical aspects of the paper. She used to spend hours in polishing our presentation and paper reading skills asking us to read papers from a critical angle so as to highlight its strong and weak points. She taught us a skill that is very valuable in the scientific community and that skill was captivating the audience when giving a technical talk, this rare skill is seriously lacking even among the best scientists of our community.

The semester ended and we all got back to our busy research life at KAIST but then in later parts of my Master´s degree I realized that her teaching and the way she groomed us in that course was extremely helpful. She literally taught us how to fall in love with research: an ability quite rare even among graduate students in world´s top universities. She keeps these technical how-to talks on her Web page and I have gone through all of them, I would definitely recommend these for all aspiring Computer Science researchers out there.

I want to particularly thank Prof. Moon for all she gave me. Knowledge, in my opinion is a priceless gift by itself and I am out of words to express my gratitude to her. Thank you Professor Moon for playing a role in my research path, your training has proven to be a great gift for me. Although my own Master´s advisor Professor Kyu-Young Whang taught me the most during my stay at KAIST (his training has also been invaluable in shaping me up as a researcher) but Professor Sue Moon is special due to the fact that she is one of the most outstanding women in Computer Science I have known. This field surely needs more inspiring women like her. I hope to meet her some day in order to thank her in person.

Saturday, August 27, 2011

Visit to Russia: RuSSIR/EDBT Summer School

Although I constantly microblogged on Twitter during my trip to Russia but nothing replaces a detailed blog post when it comes to coverage. I definitely wish to have an archive of details for myself and Information Retrieval (with of course other related areas) students around the world. I along with my husband and colleague Muhammad Atif Qureshi visited St. Petersburg, Russia from 14th August, 2011 to 20th August, 2011 for attending the prestigious Russian Summer School in Information Retrieval (RuSSIR) which was co-located with Russian Young Scientists' Conference where we presented our research work. This year's RuSSIR was quite special as the EDBT summer school was also co-located with it and as such the breadth and depth of the lectures presented at the school was immense. Here is a brief overview of the lecture sessions that I attended along with a good news for students in Karachi, Pakistan.

SocM Session: The Social Mining session was conducted by two well-known industry people namely Vladimir Gorovoy of Yandex and Yana Volkovich of Barcelona Media Innovation Center. It was highly interactive and practical with a practical recommendation task for students for which they were provided with a real dataset from Yandex Market. Here is a link for students who wish to try it out: Yandex Market practical task from RuSSIR. The session fundamentally covered various aspects of mining social media data, it began with a very correct observation borrowed from Google's analytics evangelist Avinash Kaushik that "Social media is the hot thing today, almost every one seems excited to get involved in it but no one actually knows how." This session covered that how with a glimpse into graph mining methods (PageRank, TunkRank and TwitterRank being some examples), models for opinion mining of reviews left by customers, social media engagment metrics and social innovation platforms for the future. In short, it was an extremely engaging and knowledge-enriched session particularly helpful for social media analytics students: I learned a lot during the course of this session and am particularly thankful to Dr. Yana Volkovich for some of her wonderful suggestions that will really help me in my own research.

Plenary Session (Knowledge Harvesting from Web Sources): I found this session very informative and full of pointers for new research ideas although it was a bit away from my own research area. Gerhard Weikum (Research Director at Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany) presented a comprehensive overview of research methodologies that can turn the Web into a large-scale Knowledge Base and few examples of such Knowledge Bases include DBpedia, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. The tutorial presented research methodologies along the avenue of knowledge harvesting with some examples of work on unification of WordNet and Wikipedia in YAGO, identification of a long tail of instances of entity classes through harvesting textual snippets on the Web and entity search through language model ranking. Overall the session was intense and the slides quite heavy with lots and lots of natural language processing material but definitely a great learning activity from the point of view of tools to use for your own research.
















SentA Session: This session was one of the most exciting ones for me as my own research centered around Sentiment Analysis. Professor Mike Thelwall who heads the Statistical Cybermetrics Research Group at the University of Wolverhampton delivered the talks in this session and it mostly centered around the Sentiment Strength detection tool of his research group namely SentiStrength. We were also taken through a live demonstration of the tool after which Professor Mike Thelwall explained in detail its various features along with the underlying algorithms and its experimental evaluations. The SentiStrength team has done a pretty good job at managing this tool and the best things about it is that the word list marked with a word's positive/negative strength is publicly available for research purposes. During this session students were also introduced to machine learning methods of Sentiment Analysis with detailed explanation on feature selection, gold standard creation and 10-fold cross validation. To sum up, this session was extremely useful for students wishing to make a career in Sentiment Analysis and I specially thank Professor Mike for his valuable suggestions on various aspects of the field.
















ColIR Session: This was a short session conducted by Chirag Shah of Rutgers University. It touched completely new dimensions within the field of Information Retrieval namely Information Retrieval facilitated through collaboration. According to Professor Chirag Shah with the emergence of collaborative Web platforms, information retrieval has also moved towards a completely new dimension. The traditional view of IR is that it is an individual activity: the Collaborative IR community challenges this notion by describing it as a co-ordinated activity and they have also proved their ideas in both theory and practice. This session covered both the theory and practice behind collaborative IR situations, systems, and evaluation techniques.

TopK Session: This session presented by the two charming ladies Sihem Amer-Yahia and Julia Stoyanovich was simply fantastic. We were introduced to a whole new approach of solving some of the toughest problems in social media and this approach comes from the old, classical database field. The session mainly centered around Top K processing, one of the well-known methods for ranked retrieval within the DB-IR research community, which was presented in a unique manner with a special focus on applying it to search and information discovery on the Social Web. Such applications were discussed from two significant viewpoints: 1) efficiency (minimizing both space and time requirements) and 2) user satisfaction. Both the researchers presented a comprehensive overview of their papers published in top Database and Information Retrieval conferences: VLDB, ICWSM, SIGMOD and ACM HT. Their research within the efficiency dimension was based on incorporation of upper bounds on classical top-k algorithms (threshold algorithm and no-random access algorithm) in order to minimize time and space complexity. Their research within the user satisfaction dimension presented the fundamental idea of scaling up user studies to thousands of users through leverage of crowd-sourcing platforms such as Amazon Mechanical Turk.Currently I am reading these papers to look for dimensions that can be applied to my own research in Social Media Analytics.
















Here is an archive of tweets during my attendance at RuSSIR:

#RuSSIR sessions kick off with interesting presentation on Social Media Mining by @yvolkovich and Vladimir Gorovoy

Not many people know abt. a social network exclusively devoted to travel and hospitality: CouchSurfing


Can an online social network build enough trust to allow strangers to sleep on each others’ couches: Adamic's paper http://bit.ly/prdxTy


"The Web today is the largest knowledge encyclopaedia - we need it to turn it into a comprehensive Database" - Gerhard Weikum at #RuSSIR


In a very interesting talk by Mike Thelwall explaining the working of the famous sentiment analysis tool SentiStrength #RuSSIR


Automatic sentiment analysis has more or less the same accuracy as human sentiment analysis due to complexity of problem - Mike Thelwall


A look into inside of Yandex Market by @vgorovoy in session of Social Media Mining http://twitpic.com/66vfkp


Interesting talks in TopK session at #RuSSIR: essentially about converting social media research problems to traditional database problems


Researcher from Barcelona Media Innovation Center explains the science of social media mining #RuSSIR


Mention of work of KAIST's @sbmoon in #RuSSIR in Social Media mining lecture


Andrey Plakhov explains how entity-oriented search works at Yandex: Russia's search engine that has larger market share than Google Russia


Wonder where this rule came from #RuSSIR #Yandex http://twitpic.com/67e4w1


Sihem Amer-Yahia of Qatar Computing Research Institute continues day 3 of session on TopK Processing for Social Applications


Wonderful graphic by @yvolkovich on visualization of social media conversations during Spain protests #RuSSIR


Take-home from ColIR session: Science is all about collaboration unlike the Humanities #RuSSIR
AlJazeera English tracking information of users who visit the site for improved user experience - Sihem of QCRI at #RuSSIR


SearchTogether by Microsoft Research takes user-mediated Collaborative Information Retrieval one step ahead #RuSSIR


ColIR session: reason behind failure of Google Wave was the difficulty of the system requiring a 60-minute video tutorial #RuSSIR


Take-home of TopK #RuSSIR session: Social Web is full of challenges, our online social experience will be as good as we researchers make it


A week of super-duper learning and knowledge-sharing, intense discussions and lots of research take-aways. Hats off to #RuSSIR team!!

In short Russia is a wonderful place to visit and St. Petersburg is mind-blowing. Russian people are extremely hospitable, friendly and what's best about them is their love and passion for Mathematics. All in all Russia is a great place to visit if you are a Computer Science researcher as it is full of wonderful Computer Scientists both established researchers and young science-aspiring students.

At the end I am glad to announce that Web Science group at Institute of Business of Administration will conduct an open seminar which will educate Pakistani students in some of the above-mentioned topics. Feel free to contact me in case of any suggestions for the seminar, or any topic you wish to include.

Thursday, July 28, 2011

Thoughts on Computer Science's 'Sputnik Moment'

I have not had the chance to blog of late. The past few months have been extra-ordinarily busy with lots of research ideas in the pipeline and I along with my colleague and husband am also into teaching now with the newly introduced "Introduction to Web Science and Technology" course at the Faculty of Computer Science, Institute of Business Administration. It's been a great experience working in Pakistan trying to evolve Computer Science research culture here at par with international standards: it is a tough but all the same a fascinating journey.

Today I am writing on the request of a student who asked for my thoughts on the debate being conducted in New York Times on the topic of "Computer Science's Sputnik Moment", it all began when I shared one aspect of this debate on my Facebook wall. I shared the viewpoint of Dr. Ed Lazowska (University of Washington) who believes Computer Science to be central to our future. What particularly appealed me was his statement below:

For students who want to change the world, there is no field with greater impact or leverage than computer science. Just take a look at the 2010 report by the President's Council of Advisers on Science and Technology, which characterized computer science as “arguably unique among all fields of science and engineering in the breadth of its impact.


I received a private message from a student who had a disagreement with this view point and he shared Vivek Wadhwa's arguments on the same debate. The student who happens to be an alumnus of FAST-NUCES wanted to know my viewpoint on the famous "tech bubble." The premise behind his argument was that of today's students flocking to Computer Science due to their passion to become the next Zuckerberg, and the driving factor behind a rise in Computer Science grads is gimmicky social media applications which in spite of being a major innovation is a bubble. The premise is no doubt strong; however, the point being missed here is the difference between a scientist's approach and a technologist's approach. Wadhwa lacks the insight necessary to grasp the point being made by Dr. Lazowska which is that Computer Science as a whole new science has the potential to impact almost all other fields of science: it is indispensable for society today.

Wadhwa is an entrepreneur turned academic and this in my opinion may be one of the reasons he fails to grab the essence of Computer Science as a whole. True that a large chunk of today's students run after the sparkling thing called social media but it often happens that their perception of Computer Science changes once they explore the theoretical marvels of this field. A glaring example of this is the Web Science course I am conducting at the Institute of Business Administration - initially students did not understand what the course was about and what they will be learning in it for sadly Web to them means ASP, PHP, HTML and nothing beyond that. Once we began teaching the Web from a scientific perspective students were simply amazed; we are at the point where they think beyond SEO and are well aware of the science behind search engines. The point to be illustrated is that students may not see the real depth in science first but it is not just their fault: those responsible for Science curricula should be doing things the right way and this will definitely create a difference.

Secondly, the point is not about lasting careers or high-paying jobs: it's about making a difference to the world through Computer Science. The point is about pursuing Computer Science because your country needs you and not because you need a mere job! That's what's meant by a "Sputnik moment." Look at the reports that Lazowska links to -- Computer Science is a key to the future due to its vast potential to deliver in areas that matter to our countries such as the health sector, the energy sector, the military surveillance sector and many such others. I can go on and on but what really is disturbing is the naiive approach of our students who have limited life goals and no vision on a broader scale.

Furthermore the examples that Dr. Lazowska quotes are of Noam Chomsky, Watson and Crick. Obviously, these people were not new kids on the block aiming to become the next Zuckerberg, and were not simply running after some social media setup. They were scientists with a vision: a vision to further knowledge so that it serves as a foundation for generations to come. Many of the Google tools you play with and spread on your social networks would not have even existed without Computer Scientists like Dennis Ritchie, Ken Thompson, and Brian Kernighan.

I would love to hear thoughts on this particularly from the Pakistani Computer Science circles be it students or researchers. It is hard to get people in Pakistan engaged in a knowledgeable debate and this is true even for people who have done their PhD's or PostDocs, but it's always worth a try. So feel free to add your viewpoint in the comments section.

Tuesday, May 3, 2011

Interacting with Pakistani Students: Some Tips for Taking Up a Research Career

Almost every week or two I receive emails from students around the world requiring help in their research work and tips on getting into a research career. However, there is a marked difference in the emails that I receive from Pakistani students and the ones that I receive from students in other parts of the world. European students in particular are normally requesting for my Master's thesis or papers and are at times asking questions about the techniques we use in our papers. Similarly students from Korea, China, Hong Kong, Taiwan, Egypt, and Malaysia ask brilliant questions with respect to research and are more focused towards a specific topic, in fact they even suggest some novel aspects into already existing work including pointers for some useful technique we can incorporate in our work. In short they have already identified a research path for themselves and work towards that research path with their questions aimed at getting guidelines towards their chosen topic. On the other hand, most of the students from Pakistan have this single question: please suggest me some research topic or research idea?

Today I feel the urge to write to specially address this question by Pakistani students as I feel this issue has to be taken up carefully. My advice for such students is very simple: no one can tell you a research topic of your interest. I am sure Pakistani CS students would find this answer slightly confusing so I will elaborate further. Just like nobody can tell you what is your favorite food, similarly no one can tell you what area of research you should pick for that is completely dependent upon your likes and dislikes. The fundamental problem with such a question asked by Pakistani students is that they do not even narrow down the research area/domain within which they want to work and rather put the question up at others that please suggest a research topic for me, it would be understandable if the students at least narrow down research area in which they wish to work. Dear students, please remember one thing: if you would be told your research area by someone else although you may be able to finish up the task at hand but you will never be able to realize the passion that is needed in research, you will never enjoy your research and research without enjoyment can never attain fruitful results.

"If you fancy a career as a researcher, you'll spend tens of thousands of hours on work over the next 10 years. The only way you're ever gonna spend 10,000 hours on research is only when you truly deeply love it. If something really engages you and makes you happy, then you will put in the kind of energy and time necessary to become an expert at it." - Click for Source

This is not to blame or strongly condemn the students. In fact my point is to convey what mistake our students do and I do not blame them for this state of affairs. In a country where education is more of a corporate business, and where in particular Computer Science education is hijacked by technologists who know nothing about science and where Professors do not know international standards of research and are not even aware of the best academic conferences of their field such a confusion among students is bound to exist. The problem is clearly lack of guidance for the students and not many people wish to do anything about it, in fact there are some "technology experts" who are even cashing on this "lack of guidance" for their own fame and publicity. In fact the state is so pathetic that our students do not even know what a research paper is let alone reading one, and hence they fail to grab the whole point of scientific research. When a student has no idea where to begin how can he/she get any idea about a research topic.

Here I list down some tips on basis of my research experience, these are specifically for such students who wish to do research but have no clear idea of how to carry on.

1. Narrow down your research area: if you do not know which research areas exist within the broad field of Computer Science then no worries: simply visit the web site of Computer Science departments of famous research universities such as MIT, Stanford, Berkeley, CMU, Cambridge, Oxford, CUHK, ANU, KAIST etc. and browse to their research sections where you will find many research areas listed. Do not just get fascinated by the name of a particular research area, read more about it and then make your decision on whether the area interests you or not.

2a. After step 1 i.e. identification of your research area find out the conferences/journals that are well-known for that particular area. This task will also not be hard, use DBLP for that purpose which is a Computer Science bibliography web site listing all reputed conferences and journals: the name of the conference/journal will pretty much tell you whether it's for the field you have identified or not.
2b. In addition to step 2a one more step is to google out names of famous research groups working in your identified research areas, for instance if the field you have narrowed down is Social Computing then simply search for "Research Groups Social Computing" and then browse the works of the well-known groups of that domain.

3. After listing down conferences and journals within the research area of your interest, read the most popular and latest papers of those conferences. For example any one interested in distributed systems would immediately discover Google's MapReduce paper as the de-facto distributed computing standard and should read that. Another significant factor to look for is the citations the paper has received, read the most cited papers first to get a grip on the topic, Google Scholar will help you in finding number of citations for a paper.

After having read 20-30 papers you will definitely come up with a crude idea and refinement of that idea will of course require discussions with your advisor/seniors researchers, in fact you can even email the authors of some of the papers you read. Researchers love to share and increase knowledge for that is the whole point of research: unlike the corporate, commercial world the research world does not like to hide for it is all about knowledge-sharing and a researcher who does not share his/her knowledge is never looked upon with respect.

Another handy and useful tool that can help immensely in research is Twitter, although it's known as a social networking or micro-blogging service yet it is known as the new journal archive by many in the scientific community. Some of the groups you identify within your areas of interest would be active on Twittter and you can follow them there for updates, for their latest works, and many a times for useful reading material that can help you a lot in your research. But a note of caution: don't bother them with silly questions like please tell me a topic of research, they are quite mature researchers with top-quality students and when anyone would come up to them with such questions they will consider that student as an alien and this is where you have to be extra-cautious.

Feel free to email me with any questions, and I will be glad to help. Please remember that a research career on the surface seems to be attractive but it requires extra hard-work than you would normally have to do in the software house or technology culture of Pakistan because there are no ready-made sweets in research: crafting and scientific knowledge discovery is what you would have to master which of course requires years and years of efforts.