Tuesday, July 16, 2013

Google Anita Borg 2013 Annual Retreat in Zurich, Switzerland

Being recipient of Google Anita Borg Memorial Scholarship 2013 for Europe, Middle East and Africa I was invited to the annual retreat at Google offices in Zurich, Switzerland. As readers of my blog very well know that I keep a diary record of significant research events that I attend so here goes.

The Google Retreat 2013 was held from 30th June, 2013 to morning of 3rd July, 2013. The main activities of the retreat were spread over two days (i.e., 1st and 2nd July, 2013) with 30th June reserved for registration and the welcome reception at the hotel where the scholars and finalists got to know each other through a very interesting networking Bingo. The final day consisted of a very brief breakfast tram tour of Zurich.

Below is a picture of the networking Bingo given to us by Google; for those unfamiliar with the term Bingo it is a card game played in United States and Canada where a 5x5 matrix has to be completed with numbers printed on a board either vertically, horizontally or diagonally. The difference in Google's version of Bingo was however that rather than making it a game of chance it was a game of socializing and networking with other fellow scholars and finalists; and it was great to know that most of them were fan of nerdy TV shows like "The Big Bang Theory"  and took nerd as a compliment :-)




The retreat officially kicked off with Oliver Heckman, Engineering Director at Google Switzerland, giving an overview of the engineering initiatives at Google Zurich. Many amazing Google products are a result of hard work by engineers in this Europe-based Google office with some example products being Google Maps, Google Knowledge Graph, YouTube etc. Oliver also demoed the upcoming Google's Conversational Search which seems to be a great leap in the world of Web search engines.

Next up was a technical talk by Doug Aberdeen who holds a PhD with his topic of expertise being Reinforcement Learning prior to joining Google, and within Google he works with the Gmail product team on things like spam detection but more recently on my personal favorite namely "Priority Inbox". His talk was full of valuable insights for those working in Machine Learning which is why I enjoyed it a lot. Doug's talk was different than traditional machine learning talks in the sense that it considered machine learning from a practical and realistic point of view i.e., from point of view of how to approach machine learning when building large-scale products that have to be deployed in the real-world. He said that machine learning people may seem fascinated by the huge amount of data available to Google engineers but the fact of the matter is that even Google does not have ground truth labels all the time and this is where the real Machine Learning challenge comes in. A somewhat astonishing fact for me was that 90% of machine learning algorithms at Google are simple parallel logistic regression; however, parallelizing logistic regression algorithm at Google-scale is definitely something not trivial. Doug's talk was followed by a tech talk on Engineering behind YouTube and how YouTube detects copyrights' violations; it reminded me of the following TED Talk by Margaret Gould Stewart:


Moving on we entered the Product Design workshop which was a fun experience and this activity turned out to be wonderful from a learning point of view giving an interesting insight into product management. We learnt about Google's APM (Associate Product Management) program which is a two-year product management training program specifically designed for those who love managing engineers and coming up with ideas for new products; normally those who are not so good at programming and/or do not enjoy programming enter this line (with lots of those at undergraduate or graduate level). Mind you the product managers are not above engineers in hierarchy as they are simply the people who understand what products people need and then work with engineers to build that product. At the end of the session we were divided into six groups and each group had to work on one of four product ideas; my group got the School Diary idea which we had to chalk out as a product with various features. The following pictures were taken during the product design workshop:


We then moved on to the poster show where each of us presented our respective research and it was wonderful to get feedback from the fellow scholars/finalists along with Google engineers and interns. Many lines of future work came into my mind after those interactions. We were then taken for half-an-hour Office tour around Google Zurich office and the work environment there was fantastic with loads and loads of isolation compartments where programmers/engineers could lie down for a while, think alone (you know during the tough programming phases when you're stuck badly in some problems) and even talk on the phone. The entire office was full of free snacks, coffee plus various beverages and ice-cream; there was a Sky Lounge, a Jungle Lounge, Water Lounge and my personal favorite the restaurant named Fork() (yes, it is inspired from fork() command under Linux). The day's final activity was the talk by SVP of Knowledge namely Alan Eustace straight from Mountain View via video conference. Alan Eustace is the pioneer of Google Anita Borg program; he told us a bit of history behind the scholarship and some of the time he spent with Dr. Anita Borg along with some funny stories about his daughter and how he explains Computer Science to her.

The second day was full of more fun for all of us as most of it had been divided into parallel sessions based on the attendees' year of study and research interests. Following is a list of the parallel sessions with the ones attended by me in bold font:

09:00 - 11:00    Parallel session 1: Android coding challenge
09:00 - 11:00    Parallel session 2: UX web design
09:00 - 11:00    Parallel session 3: SRE Workshop
09:00 - 11:00    Parallel session 4: Natural Language Processing and Research at Google
11.30 - 12.30    Parallel session: Women in Computing
11.30 - 12.30    Parallel session:   Mind the Gap
11.30 - 12.30    Parallel session: Employbility Session
14.45 - 16:15    Parallel session: Day in life of an Intern
14.45 - 16:15    Parallel session: Interview workshop
16.45 - 17.45    Career Panels: BSc students
16.45 - 17.45    Career Panels: MSc students
16.45 - 17.45    Career Panels: PhD students 1
16.45 - 17.45    Career Panels: PhD students 2

Perhaps the session on Natural Language Processing and Research at Google was one of the most awaited and popular one with most of the attendees opting for it. During the one hour Natural Language Processing session, Enrique Alfonseca who heads Natural Language Processing division at Google, Zurich gave a talk on his recently accepted ACL2013 paper in which a headline generative system is proposed that can augment Google's Knowledge Graph. The problem is motivated by the observation that news headlines are rarely objective and every news agency reports an event differently. From a computational perspective, such noisy headlines make it hard to detect events thereby making it a significantly challenging problem to augment event-based knowledge bases such as Google Knowledge Graph. The proposed model exploits event relatedness in news collections through dependency parsing on syntactic patterns using a Noisy-OR Bayesian network. Those interested can read the full paper here. Next up was a panel discussion on Research at Google with David Harper (one of Bruce Croft's PhD graduate). This was a highly interactive panel with research scientists (who were once renowned academics) giving insights into what it's like to work on real-world products/systems used by millions of users around the world; turns out it is a whole new experience with satisfaction far more different than joy of getting your research published. I asked two significant questions during this panel from point of view of my own plans of a research internship during PhD and my ambition to remain in academia. At the end of the panel session David Harper mentioned an important resource that gives a very detailed description of how Google approaches research; it is a Communications of the ACM article that can be accessed here.


We then entered the Women in  Computing panel which was very interesting for women Computer Scientists. This mostly centered around the question of how women engineers at Google manage an engineering job in industry with kids. Google, Zurich has a flexible policy for mothers-to-be and up to 8 months of maternity leave are granted; along with that there is an option to opt for part-time work along with the option to work from home. Moreover, it is up to the woman herself how she manages the engineering role with her kids and it all comes down to priorities; for a woman kids are always the priority as a Google engineer very nicely put it, "Engineering work can be done by someone else but only I can be a mother to my child". Then another interesting perspective that came up was with respect to quality time being spent with your kids; according to one woman engineer at Google when you know you are always with your kids you take it bit lightly and the quality of the time you spend with them suffers whereas if you are working you know that all the time you spend with your kid has to be quality time. Moving the focus a bit I asked without taking names of course about the assertions by some women in CEO positions that very few women are in those roles and what were the thoughts of women engineers at Google on that to which they replied that it's all up to a person's priorities, CEO positions don't matter that much as long as you enjoy your work and life both.

In the panel session on  Day in the Life of an Intern we were told about the work routine in various intern positions at Google. There are basically three intern positions at Google: APM (Associate Product Manager) which has to do with managing products at Google thinking of new features etc., SWE (Software Engineering) which has to do with programming behind Google products, and SRE (Site Reliability Engineering) which concerns site administration to keep the Google site up and running round the clock. A typical day of an APM intern involves loads and loads of meetings with engineers, discussions on certain features of products, a lot of email communications and among other things motivation boosters for the product team. A typical day of a SWE intern involves programming on the tasks assigned to him/her for the most part with little or no administrative stuff. A typical day in the life of a SRE intern involves being on wait and rushing to situations when a complaint arrives regarding the site being down.

The last session I took was Career Panel: PhD Students 2 which mainly centered around career options that PhD students can take once they are done with their PhD. There was a very interesting friction of academia vs. industry in this panel session with some of the panelists making honest confessions of missing academia specially interaction with students and the joy of getting research published while also accepting that one of the strongest motivations in moving from academia to industry is money. In an industry such as Google things are done differently with less freedom to work on things of your choice (like in academia) and the style of work is product-centric rather than research-centric; you cannot afford to solve a research problem in its entirety as the product release has a certain timeline which has to be met. Note that this is different from the other Web industry giants like Microsoft and Yahoo! which both have a separate research division while Google has merged research scientists with engineers in all of their product teams in order to meet the ambitious goal of "organizing the world's information and make it universally accessible and useful."


Saturday, July 13, 2013

The Journey Towards Becoming a Google Anita Borg Memorial Scholar

Those of us who know me and have been following me may know that I recently got the Google Anita Borg Memorial Scholarship for Europe, Middle East and Africa. This is the first time that a woman from Pakistan has won this prestigious scholarship ever since its inception in 2007. Over the past few weeks several people (specially women in Pakistani tech circles) have requested me to share my journey towards this scholarship and what were the hurdles that had to be overcome along the way. So, here I am sharing my story for those who had requested me.

First and foremost it would not have been possible without the support of two very important males in my life namely my father and my husband. My father has a huge role because he is the one who gifted me with the best education possible throughout my childhood thereby building strong foundations for me in early days. I firmly believe my husband to be one of the finest programmers of the world and those who have worked with him can definitely bear testimony to that. My husband has a huge role in this success as he is the one who is always working hard on me to polish my programming skills (giving me useful advices at every stage of life be it technical or any other matter pertaining to life). For a woman to be successful, it is very significant to have the support of male members of her family and this is what completes a life of a female member in the family despite the fact that media continuously reports negative things; the reality has been different throughout my life and also in the life of those whom I know back home in Pakistan. By splitting family apart no entity of family can function better and I would compare family to a running engine with each part playing an important role.

Coming back to the story it all began with the nights I used to spend in solving tough mathematical problems during my O-level days. When compared to the matriculation system, we have a considerably different and tougher Mathematics curriculum in O-levels (with subjects such as Probability and Statistics, Differentiation, Vectors etc. included and which normally Matriculation students study at a later stage); more than the curriculum I very well remember the role of my teachers who kept re-iterating their pride in me when I successfully solved a Mathematics challenge problem (our O-levels book had some of those in every activity and normally I was the only one in class who solved them); the joy of getting praise from your Maths teacher for solving a problem that no one in the class was able to solve was simply out of this world and it kept me going until the undergraduate stage came where I had to decide my major. On account of my love for Applied Mathematics a natural choice was Computer Science. This new world both amazed and baffled me for I had no prior experience in programming but challenges are one of the biggest motivators towards the path of learning and even history bears testimony to that; greater the challenges in one's life greater he/she is able to learn to overcome them.

Right in the beginning of my undergraduate years I came across some highly innovative and selfless people and together we formed the first ever open source students body BloX in our university, under BloX I imparted useful Linux knowledge to my juniors and helped them in getting a grip over fundamental Linux concepts. Mind you I have completely discarded anything to do with Windows as of now and am a proud Linux convert; and I also attribute a great deal of credit in my success to this wonderful operating system which always teaches you so much about the world of Computing. Many of those who had joined BloX in its initial days left it; it turned out they were after the fascination of it all as BloX got to represent Department of Computer Science, Karachi University in ITCN Asia 2004. Soon after ITCN Asia 2004 when actual Linux development had to be done not many wanted to go for it as it was not the "in thing in market" and could not guarantee a job which seemed to be the only purpose of Computer Science undergrads those days (this remains true to this day) and very few cared about the knowledge of science behind Computers. We finally had to dissolve BloX but the experience left us more motivated and charged; today a smile comes to my face thinking of those fun-filled days. I along with my colleague (who happens to be my husband now) kept doing the fun things in the world of Computer Science winning software competitions along the way, developing our own research-based Linux distribution called PAL Linux which was also distributed to all students of Parallel Computing final year course and finally getting our very own research paper published (it was about redefinition of images so as to enhance semantic search over them). All this while our colleagues started internships/jobs in reputed software houses of Pakistan and they had already begun to make money adding to the peer pressure; however, we kept going despite the odd questions we faced with regard to our career after BS (Computer Science). I did however join a small, unknown software house and I very well remember the critics of this decision from among my class mates; however, that was only to keep some amount of money coming since we needed funds for both marriage and MS abroad (by this time we had made up our minds to pursue an advanced degree in Computer Science).

South Korea seemed to be the best choice for both of us as there was tuition fees exemption along with a stipend to cover living expenses and KAIST happens to be the MIT of entire Asia. I felt more passionate when Professor Kyu-Young Whang of Database and Multimedia Laboratory in KAIST was ready to support our application as married students. Despite the fact that to many, South Korea was an unusual choice, and in their ignorance (underestimation of South Korea as significant entity in scientific world) everyone seemed to be advocating for United States as ultimate destination for Master's degree in Computer Science, we knew we had made the right choice and time bore testimony to that. KAIST turned out to be a life-changing experience and I can easily say it made me learn more than what some of my seniors doing MS in Europe or United States learnt. Professor Whang is an ACM Fellow within the Database community and a Computer Science legend within himself; he made us spend hours in the lab (sometimes we would work for more than 16 hours a day and during my Master's thesis defense I spent three days plus three nights straight in the lab with my husband cooking noodles for both in snow using a portable stove). I attribute much of my Computer Science research skills to Professor Whang and his PhD/PostDoc students who taught us valuable stuff behind coming up with a research statement, identifying open issues in current state-of-the-art within a field, design of solutions for solving a research problem in Computer Science, programming in the best way possible so as to keep systems scalable and useful for generations to come and writing your papers as clearly as possible adhering strongly to scientific method of passing knowledge. This article of mine on "Programming vs. Coding" was a result of some of Professor Whang's advices during his Database class and I did mention this article in my Google Anita Borg application. All this time we maintained links back in Pakistan and students kept writing to me for advices on career paths; I took out time to answer them and to always stay in touch with my roots back home.

During our respective PhDs, we wanted to explore a different region and Europe was our choice with flexible, caring supervisors and excellent research opportunities to come up with our own problem statement. Adding to this is wonderful experience of my current PhD supervisors namely Colm O'Riordan and Gabriella Pasi who always have enriching research directions from within information retrieval and fuzzy logic; and they provided us with what was missing in South Korea i.e. the opportunity to form research networks around the world and freedom to pursue paths we choose best for ourselves. Lastly, and most significantly, we still maintain a presence in Pakistan via our own research lab within the Computer Science Department of the Institute of Business Administration, Karachi, Pakistan - an experience that could be characterized as both exciting and frustrating. At times it is really painful to argue for hours with people in academic circles back home on the usefulness of a research lab and why it is essential to conduct scientific research. In countries like Pakistan, universities focus mainly on teaching, as there is insufficient support for research (mainly due to economic problems). I am constantly working to break this culture; I work with various students from time to time where I assist them for their thesis or final year projects motivating them for novel research ideas in the domain of Web Science. The Web Science and Technology Research lab, despite still being in its infancy, has been successful. Last year it was represented at the International Conference on World Wide Web, one of the most prestigious conferences in my field.

As a summary here are some tips for those who asked
1) Value people and treat them with respect as you can learn something from each and every person you come across. Take out time to reply to emails of those expecting something from you or reaching out to you even if its a very small matter; it does make a lot of difference at the end of the day.
2) Speak less and do more; there are times when actions mean everything and you have to give up ranting about things like success and rather take steps to achieve your goals. Remember procrastination is human's worst enemy.
3) Don't keep complaining about your circumstances as they are never easy for anyone. I remember a time when I did not have money to buy a Computer table and I had to program sitting on the floor; I didn't complain then and today I own around three computers/laptops in various parts of the world.
4) Communication skills matter a lot and it is extremely important to market yourself in the best way possible. Everyone has something special and he/she just needs the right way to market that something special.
5) Love technical stuff (not just gadgets) but science behind the things; do not kill your intellectual curiosity by settling for "glittering" gadgets and instead focus on innovative ideas stemming out from your gadgets.
6) Do not pay much heed to critics of your decisions for they are there to make you firm. This does not mean not paying attention to meaningful advices from people that matter but remember most of the criticism comes from people who know not.

To end on a humorous note, I made a funny meme which sort of was my reaction when people acknowledged my achievement in big words. This is not intended to make anyone feel bad and is just pure humor.


Saturday, November 10, 2012

Mining Tweets of World Cup T20 Match between India and Pakistan: Interesting Insights from Social Network Analysis

I have been quite absent from this space I call my blog for quite sometime now and this is not without reason. The past few months have been extremely busy with lots of traveling (Milan, Venice, Rome, Nijmegen and Copenhagen all in three-four months' time),  and of course the never-ending paper submissions. As I had explained in my previous post on online education initiatives I am also taking the Computer Science online courses on Coursera and this semester I happened to take up a very interesting course by Lada Adamic of University of Michigan (Social Network Analysis). Though I have myself taught some aspects of Social Network Analysis during a summer course at Faculty of Computer Science at IBA, Karachi but despite that I found this course intriguing and the way Lada enriched it with cool applications of SNA was simply amazing.

As an optional part of this course the students were to submit a programming project and I thought what better opportunity than this to submit a part of the TweetCric project being undertaken by our research group. It's always good to get some early feedback on your work in order to gain useful, innovative directions and hence, I decided to blog about my Social Network Analysis project. Readers are welcome to suggest any new directions or give their feedback in comments so as to help us in this project. Following is a description of the project for interested readers of my blog:

Social media applications have considerably influenced the lives of millions and everyday there is a huge amount of updates to various social networks such as Facebook and Twitter. As of March 2012, more than 400 million tweets were being posted on Twitter each day. The volume of tweets becomes significantly high during a sporting event as many sports fans now use social media as a part of their viewing experience. Users describe this as an experience full of pleasure and fun as described in following Facebook status update during the recent World Cup T20 match between India and Pakistan:


"Facebook comments are more interesting than the match. Already more than two pages of comments. Looks like PakInd Vs Facebook"

Interestingly, the huge amount of content produced during sporting events can be used for analysis of players' performance and in light of that sports managers can decide future sports strategies and hence the notion of crowd-sourced sports critics can be realized in practice. Researchers have already begun to explore the possibility of using this huge volume of user-generated content to solve various research issues such as event detection, video annotations for sports summaries etc. [1, 2]. We argue in this work to utilize this huge crowd-sourced content for the usefulness of sports strategy analysts and decision-makers. In this work, we use social network analysis to highlight significant players during the match along with an analysis of the reasons of why social network analysis methods detect these players.

Social Network Modeling
The data was obtained using the Twitter Search API. During the epic match held on September 30th 2012, we gathered tweets for the match using the Twitter Search API. We regularly queried the Search API through a Python script on half-hour intervals thereby collecting fresh tweets as the match progressed. In total we collected a sample of 43,450 tweets during the match with hashtag PakvsInd.

We modeled the social graph of the players and commentators using the text content of the tweets. First, using Wikipedia and ESPN CricInfo as an external resource we compiled a list of players and commentators relevant to the India-Pakistan cricket match. This list was then used to detect tweets containing a mention of any player or commentator; following list shows some sample tweets

  1. hafeez goes, 15 from 28 balls.. idiot, wasted his time big time. game over. #pakvsind
  2. like the world cup of pakistani batsmen falling against yuvraj. kamran also departs edging to dhoni. pak 56/4 after 9. #pakvsind
  3. rt @maria_memon: rt @maria_memon: afridi! quit playing games with our hearts....our hearts....#pakvsind'
  4. hafeez is the reason for todays batting performance.... after nazir he put all of the team under pressure! #pakvsind
  5. is dhoni trying to piss off pakistanis by bringing in kohli? #pakvsind'
We now explain how we formulate the nodes and edges in our social network of players and commentators. Each player/commentator is treated as a node and an edge is represented between players/commentators if they co-occur in a tweet. As an example consider tweet 2 above; there would be edges between yuvraj, kamran and dhoni according to our model. In total 8,587 tweets (19.8%) contained a mention of some player or commentator.

The following figure shows the visualization that was obtained from this social network (Gephi was used for the generation of the graph)

Modeled social network of players/commentators during World Cup T20 India-Pakistan match
As clear from the Figure, there are three communities within this social network. Nodes are sized according to betweenness centrality and it can be seen that Hafeez is the node with highest betweenness: this is because this particular player was the captain of Pakistani team in that match, Pakistan lost the match due to his poor captaincy, poor fielding placements and poor batting (as per most of the tweets). The node with second-highest betweenness i.e. Kohli is the one who got man of the match and scored the highest runs leading India to a comfortable victory. Hence, it can be seen that social network analysis gives important insights into sporting events. Natural language processing as an alternative approach seems to lack the precision and efficiency that social network analysis offers. Our team has been long arguing for a hybrid approach that utilizes both natural language processing and social network analysis approaches to address the various research questions in the fields of Information Retrieval and Web Information Systems given the low scalability and speed of Natural Language Processing alone [3].

We now analyse the communities within this dataset. The community represented in blue is mostly comprised of Indian players and it makes sense as to why they form a separate community. However, the inclusion of Misbah and Ajmal in this community is weird since both are Pakistani players - further analysis reveals as to why this occurred and it was due to Ajmal taking important wicket of Sehwag causing Ajmal to go into that community and Misbah being mentioned with Ajmal once forced him there too. The community in dark green represents for the most part Pakistani players with the exception of Dhoni who is the Indian team captain; this however occurs due to Twitterers comparing Hafeez's captaincy with Dhoni's captaincy thereby forcing Dhoni in that community. Lastly, the community in aqua green represents players who did not play in the match with the exception of Afridi and he was forced into that community due to tweet suggestions from Pakistani cricket fans of dropping him and including him in the list of those not playing the match.

Lastly as I mentioned in the beginning of this post as well any feedback or idea is welcome. Interested students who want to join this project are requested to contact me personally via email or social networks.

References:
[1] J. Nichols, J. Mahmud, and C. Drews. Summarizing sporting events using twitter. In Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, IUI ’12, pages 189–198, New York, NY, USA, 2012. ACM.
[2] A. Tang and S. Boring. #epicplay: crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 1569–1572, New York, NY, USA, 2012. ACM.
[3] A. Younus, M. Qureshi, F. Asar, M. Azam, M. Saeed, and N. Touheed, “What do the average twitterers say: A twitter model for public opinion analysis in the face of major political events,” in 2011 International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2011, pp. 618–623.

Friday, May 18, 2012

Online Education Initiatives: A Hope for Education in Less Developed Countries


I very well remember my time in the Computer Science Department of Karachi University when teachers who did not take classes really annoyed me. Other class-fellows would call me crazy on account of being so nerdy but I knew this was a valuable period of our life which would never come back. This was the age where the mind is ready to absorb all knowledge which thanks to our messed-up education system (not to forget the loads of politics that pollutes it) was literally being wasted. The dilemma of technological sciences such as Computer Science in lesser developed countries like Pakistan lies in it being more of a hype than a science. In my part of the world students flock to Computer Science to get good jobs after graduation: of course this is a necessity and the point of a good education but shall it be the only goal is the real question we should address.

Back then there was a frustrating time when the Object-Oriented Programming teacher gave us the option of either to learn OOP concepts with C# or C++, and, unfortunately most of the class went for C# due to its being in demand by the job market. At that point, I realized how tough a time Computer Science will get in Pakistan and this remains true to this day. Sadly not only students but teachers have also promoted the job-oriented study model leading to a myth that Computer Science is all about sitting on a desk writing code in .NET or PHP (or any other programming language for that matter).

Meeting the well-known scientist, Rakesh Agarwal from Microsoft Research confirmed my assertions about the pathetic state of affairs of technological sciences in countries like India and Pakistan. He shared the same dissatisfactions as me, and strongly criticized the industry in the lesser-developed countries. Equivalently sad is the state of affairs at the national universities in South Asia, and the situation is changing at a very slow pace. When Stanford announced its online courses, I saw acquaintances in my social network sharing about it and the ones most excited about these online courses were undergraduate students from institutes of my country. This as I see it is a silver lining admist the dark clouds as online education initiatives like Coursera, EdX and Udacity will now grant access to quality education to students from all over the world. This in my opinion is a huge step towards bridging the digital divide and it is now upon students in the developing world to make most of this opportunity. Today's connected society gives easy and massive access to knowledge unlike the situation I had back in my undergraduate days and I feel students today are far more blessed than students of my time.

What started as an educational initiative by accomplished Stanford Professors Daphne Koller and Andrew Ng has now turned into a global phenomenon with the best universities contributing to make knowledge open for all. If studying at world's reputed universities (Stanford University, MIT, Harvard, University of Michigan, University of Pennsylvania  etc.) was ever your dream then there can be no better time to go and get that dream. Some students might take this as an exaggerated statement but this comes from me after personally taking two online courses this semester and enjoying them to the maximum. Furthermore, Coursera statistics also confirm the value that online education has now added to universities; they could never have achieved this value as Andrew Ng puts it: "I normally teach 400 students," Ng explained, but last semester he taught 100,000 in an online course on machine learning. "To reach that many students before," he said, "I would have had to teach my normal Stanford class for 250 years."



It is a generally held notion that the academic culture and the styles of teaching in our part of the world are out-dated and boring. I can certainly confirm this assertion on account of my experience in Pakistani academic circles for quite sometime now. For the most part, higher-education circles in developing regions limit ideas to an academic document on a shelf quite unlike the way that things are done in the top research universities of the world. Students have always wanted to know how the ideas that they study in the classroom apply to the real-world problems around them. With world-class Professors offering online courses, there is an oppurtunity to get much of those questions answered.

Online education as a phenomenon is not new and for years people in less developed regions have been skeptical of them but it's quite different with Coursera and other similar initiatives. The revolutionary ideas behind these initiatives are the concept of testing, grading, student-to-student help and awarding certificates of completion of a course. Daphne Koller, a Stanford computer science professor who founded Coursera with Ng, explained in her talk at LinkedIn last week, "It will allow people who lack access to world-class learning - because of financial, geographic or time constraints — to have an opportunity to make a better life for themselves and their families."



So the next time students come to me seeking advice on how to start with research or how to apply for foreign universities I'd recommend him/her to take some courses (that relate to his/her area of interest) on Coursera or any such platform. With such initiatives coming from the world's top-class universities there is a hope for revolutionization of higher education by allowing students from all over the world to not only hear top-quality lectures, but to do homework assignments, be graded, receive a certificate for completing the course and use that to get a better job or gain admission to a better school.

Sunday, April 15, 2012

WWW2012 Poster: New Media vs. the Old Media


Today's social-media savvy age has considerably changed the paradigm of traditional journalism. Interestingly, it has also led to new debates within the journalism and media industry with supporters of social media terming it as a platform for the masses' voice while opponents terming it as gibberish and noise. Old-school journalism disregards the significance of social media popularity for any article on the pretense of “journalism is not about feeding the masses with whatever crap they want to be fed with.”

It turns out that this entire debate is not as simple as it appears to be on the outlook. What old-school journalism advocates do not take into account is the age-old phenomenon termed as “media bias” by the social sciences research community. A famous paper published in 2004 by the Department of Political Science at UCLA and the Department of Economics at University of Missouri studies the bias of famous news outlets in the US. Since then there have been various attempts at studying biases in traditional media platforms (such as New York Times, Fox News, Washington Post, CBS, Wall Street Journal) with most of these coming from the sciences (social science, political science, Computer Science). Empirical evidence is what is given utmost importance from a scientific viewpoint and unfortunately the social media circles in Pakistan tend to ignore this angle altogether. This brings into the picture a new phenomenon of bias measurement in various forms of media which turns out to be a huge research challenge within itself. The solution: yes, social media with the insights and popularity judgements can serve as a tool not just for the masses' voices but also for measurement of bias in traditional media and this is exactly what a team of researchers in IBA's Web Science group have done.

The crucial nature of the media industry makes it all the more essential to have ways and means of verification of its content. This leads to the natural question of how new media namely the social media can help measure the inevitable biases inherent in traditional media. Few of these questions have been answered by researchers from one of Karachi's most prestigious educational institute, Institute of Business Administration whereby they investigated differences between news appearing on traditional and social media platforms via publicly available data from famous microblog site Twitter. Being a part of this team made me delve deeper into various aspects of media both internationally and in Pakistan with my observation being that today's media tend to ignore the crucial role of social media and does not take into account popular demands. With this conclusion, we argue for a paradigm shift in how traditional media platforms perceive the new media landscape and the sooner they embrace this new world the better for their own survival.

Some technical details of the study warrant an explanation which is as follows. The data mining similarity metric of Jaccard Similarity has been used to investigate the differences in named entity coverage between the 16 million tweets posted during the time period of Egypt uprising (tweets' data obtained from TREC 2011 microblog track) and the New York Times articles corresponding to Egypt. The figure below shows our results:



It demonstrates a significantly low value of coverage (Jaccard Similarity being below 0.5 for all days) thereby proving the presence of media bias. Moreover, we extend this study to a local level (for Pakistani media outlets) on a daily basis for the month of November. The extension utilizes topic models (specifically standard LDA and Twitter-LDA) in order to discover similar topics in the two media followed by a ranking function which computes popularity of a news item in the two platforms. This is then compared with a manually ranked list with the final result being that the ranks obtained from social media (tweets data) match the human-annotated ranks more closely.

For those interested, here's the abstract of our paper:
It is often the case that traditional media provide coverage of a news event on the basis of journalists’ viewpoints - a problem termed in the literature as media bias. On the other hand social media have given birth to an alternative paradigm of journalism known as “citizen journalism”. We take advantage of citizen journalism to detect the bias in traditional media and propose a simple model for empirical measurement of media bias.

Note: This is part of a long-term project by the Web Science research group at Institute of Business Administration, Karachi, Pakistan and we welcome interested students to be a part of our project.

The slides for the work can be viewed here and the full 2-page poster paper can be downloaded from here.

Sunday, March 4, 2012

Three cheers for Professor Moon

Few days back when I read in my Facebook news feed an update from Professor Sue Moon that she is now tenured Professor at KAIST, I was immensely delighted. This post is a special tribute to Prof. Sue Moon from a student that did not get to spend much time with her but whatever time I spent with her played a huge role in my learning path. It all began in Spring 2009 when I took up Professor Moon´s course on Advance Networking. At first she sounded hard to impress but then I figured out it´s her way of teaching the students. The Advance Networking course she was teaching us was special, it turned out to be one of the toughest and yet greatest learning experiences of my life. She had specially designed the course keeping in mind the struggles young researchers have to face. Throughout the semester we were expected to read papers, write a critique of the paper and present some of the selected papers in class as if they were our own papers. This activity turned out to be quite hectic and each student used to dread the day when he/she had to present and one strong reason for that was Professor Moon´s fiery questions about the technical aspects of the paper. She used to spend hours in polishing our presentation and paper reading skills asking us to read papers from a critical angle so as to highlight its strong and weak points. She taught us a skill that is very valuable in the scientific community and that skill was captivating the audience when giving a technical talk, this rare skill is seriously lacking even among the best scientists of our community.

The semester ended and we all got back to our busy research life at KAIST but then in later parts of my Master´s degree I realized that her teaching and the way she groomed us in that course was extremely helpful. She literally taught us how to fall in love with research: an ability quite rare even among graduate students in world´s top universities. She keeps these technical how-to talks on her Web page and I have gone through all of them, I would definitely recommend these for all aspiring Computer Science researchers out there.

I want to particularly thank Prof. Moon for all she gave me. Knowledge, in my opinion is a priceless gift by itself and I am out of words to express my gratitude to her. Thank you Professor Moon for playing a role in my research path, your training has proven to be a great gift for me. Although my own Master´s advisor Professor Kyu-Young Whang taught me the most during my stay at KAIST (his training has also been invaluable in shaping me up as a researcher) but Professor Sue Moon is special due to the fact that she is one of the most outstanding women in Computer Science I have known. This field surely needs more inspiring women like her. I hope to meet her some day in order to thank her in person.

Saturday, August 27, 2011

Visit to Russia: RuSSIR/EDBT Summer School

Although I constantly microblogged on Twitter during my trip to Russia but nothing replaces a detailed blog post when it comes to coverage. I definitely wish to have an archive of details for myself and Information Retrieval (with of course other related areas) students around the world. I along with my husband and colleague Muhammad Atif Qureshi visited St. Petersburg, Russia from 14th August, 2011 to 20th August, 2011 for attending the prestigious Russian Summer School in Information Retrieval (RuSSIR) which was co-located with Russian Young Scientists' Conference where we presented our research work. This year's RuSSIR was quite special as the EDBT summer school was also co-located with it and as such the breadth and depth of the lectures presented at the school was immense. Here is a brief overview of the lecture sessions that I attended along with a good news for students in Karachi, Pakistan.

SocM Session: The Social Mining session was conducted by two well-known industry people namely Vladimir Gorovoy of Yandex and Yana Volkovich of Barcelona Media Innovation Center. It was highly interactive and practical with a practical recommendation task for students for which they were provided with a real dataset from Yandex Market. Here is a link for students who wish to try it out: Yandex Market practical task from RuSSIR. The session fundamentally covered various aspects of mining social media data, it began with a very correct observation borrowed from Google's analytics evangelist Avinash Kaushik that "Social media is the hot thing today, almost every one seems excited to get involved in it but no one actually knows how." This session covered that how with a glimpse into graph mining methods (PageRank, TunkRank and TwitterRank being some examples), models for opinion mining of reviews left by customers, social media engagment metrics and social innovation platforms for the future. In short, it was an extremely engaging and knowledge-enriched session particularly helpful for social media analytics students: I learned a lot during the course of this session and am particularly thankful to Dr. Yana Volkovich for some of her wonderful suggestions that will really help me in my own research.

Plenary Session (Knowledge Harvesting from Web Sources): I found this session very informative and full of pointers for new research ideas although it was a bit away from my own research area. Gerhard Weikum (Research Director at Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany) presented a comprehensive overview of research methodologies that can turn the Web into a large-scale Knowledge Base and few examples of such Knowledge Bases include DBpedia, KnowItAll, ReadTheWeb, and YAGO-NAGA, as well as industrial ones such as Freebase and Trueknowledge. The tutorial presented research methodologies along the avenue of knowledge harvesting with some examples of work on unification of WordNet and Wikipedia in YAGO, identification of a long tail of instances of entity classes through harvesting textual snippets on the Web and entity search through language model ranking. Overall the session was intense and the slides quite heavy with lots and lots of natural language processing material but definitely a great learning activity from the point of view of tools to use for your own research.
















SentA Session: This session was one of the most exciting ones for me as my own research centered around Sentiment Analysis. Professor Mike Thelwall who heads the Statistical Cybermetrics Research Group at the University of Wolverhampton delivered the talks in this session and it mostly centered around the Sentiment Strength detection tool of his research group namely SentiStrength. We were also taken through a live demonstration of the tool after which Professor Mike Thelwall explained in detail its various features along with the underlying algorithms and its experimental evaluations. The SentiStrength team has done a pretty good job at managing this tool and the best things about it is that the word list marked with a word's positive/negative strength is publicly available for research purposes. During this session students were also introduced to machine learning methods of Sentiment Analysis with detailed explanation on feature selection, gold standard creation and 10-fold cross validation. To sum up, this session was extremely useful for students wishing to make a career in Sentiment Analysis and I specially thank Professor Mike for his valuable suggestions on various aspects of the field.
















ColIR Session: This was a short session conducted by Chirag Shah of Rutgers University. It touched completely new dimensions within the field of Information Retrieval namely Information Retrieval facilitated through collaboration. According to Professor Chirag Shah with the emergence of collaborative Web platforms, information retrieval has also moved towards a completely new dimension. The traditional view of IR is that it is an individual activity: the Collaborative IR community challenges this notion by describing it as a co-ordinated activity and they have also proved their ideas in both theory and practice. This session covered both the theory and practice behind collaborative IR situations, systems, and evaluation techniques.

TopK Session: This session presented by the two charming ladies Sihem Amer-Yahia and Julia Stoyanovich was simply fantastic. We were introduced to a whole new approach of solving some of the toughest problems in social media and this approach comes from the old, classical database field. The session mainly centered around Top K processing, one of the well-known methods for ranked retrieval within the DB-IR research community, which was presented in a unique manner with a special focus on applying it to search and information discovery on the Social Web. Such applications were discussed from two significant viewpoints: 1) efficiency (minimizing both space and time requirements) and 2) user satisfaction. Both the researchers presented a comprehensive overview of their papers published in top Database and Information Retrieval conferences: VLDB, ICWSM, SIGMOD and ACM HT. Their research within the efficiency dimension was based on incorporation of upper bounds on classical top-k algorithms (threshold algorithm and no-random access algorithm) in order to minimize time and space complexity. Their research within the user satisfaction dimension presented the fundamental idea of scaling up user studies to thousands of users through leverage of crowd-sourcing platforms such as Amazon Mechanical Turk.Currently I am reading these papers to look for dimensions that can be applied to my own research in Social Media Analytics.
















Here is an archive of tweets during my attendance at RuSSIR:

#RuSSIR sessions kick off with interesting presentation on Social Media Mining by @yvolkovich and Vladimir Gorovoy

Not many people know abt. a social network exclusively devoted to travel and hospitality: CouchSurfing


Can an online social network build enough trust to allow strangers to sleep on each others’ couches: Adamic's paper http://bit.ly/prdxTy


"The Web today is the largest knowledge encyclopaedia - we need it to turn it into a comprehensive Database" - Gerhard Weikum at #RuSSIR


In a very interesting talk by Mike Thelwall explaining the working of the famous sentiment analysis tool SentiStrength #RuSSIR


Automatic sentiment analysis has more or less the same accuracy as human sentiment analysis due to complexity of problem - Mike Thelwall


A look into inside of Yandex Market by @vgorovoy in session of Social Media Mining http://twitpic.com/66vfkp


Interesting talks in TopK session at #RuSSIR: essentially about converting social media research problems to traditional database problems


Researcher from Barcelona Media Innovation Center explains the science of social media mining #RuSSIR


Mention of work of KAIST's @sbmoon in #RuSSIR in Social Media mining lecture


Andrey Plakhov explains how entity-oriented search works at Yandex: Russia's search engine that has larger market share than Google Russia


Wonder where this rule came from #RuSSIR #Yandex http://twitpic.com/67e4w1


Sihem Amer-Yahia of Qatar Computing Research Institute continues day 3 of session on TopK Processing for Social Applications


Wonderful graphic by @yvolkovich on visualization of social media conversations during Spain protests #RuSSIR


Take-home from ColIR session: Science is all about collaboration unlike the Humanities #RuSSIR
AlJazeera English tracking information of users who visit the site for improved user experience - Sihem of QCRI at #RuSSIR


SearchTogether by Microsoft Research takes user-mediated Collaborative Information Retrieval one step ahead #RuSSIR


ColIR session: reason behind failure of Google Wave was the difficulty of the system requiring a 60-minute video tutorial #RuSSIR


Take-home of TopK #RuSSIR session: Social Web is full of challenges, our online social experience will be as good as we researchers make it


A week of super-duper learning and knowledge-sharing, intense discussions and lots of research take-aways. Hats off to #RuSSIR team!!

In short Russia is a wonderful place to visit and St. Petersburg is mind-blowing. Russian people are extremely hospitable, friendly and what's best about them is their love and passion for Mathematics. All in all Russia is a great place to visit if you are a Computer Science researcher as it is full of wonderful Computer Scientists both established researchers and young science-aspiring students.

At the end I am glad to announce that Web Science group at Institute of Business of Administration will conduct an open seminar which will educate Pakistani students in some of the above-mentioned topics. Feel free to contact me in case of any suggestions for the seminar, or any topic you wish to include.

Thursday, July 28, 2011

Thoughts on Computer Science's 'Sputnik Moment'

I have not had the chance to blog of late. The past few months have been extra-ordinarily busy with lots of research ideas in the pipeline and I along with my colleague and husband am also into teaching now with the newly introduced "Introduction to Web Science and Technology" course at the Faculty of Computer Science, Institute of Business Administration. It's been a great experience working in Pakistan trying to evolve Computer Science research culture here at par with international standards: it is a tough but all the same a fascinating journey.

Today I am writing on the request of a student who asked for my thoughts on the debate being conducted in New York Times on the topic of "Computer Science's Sputnik Moment", it all began when I shared one aspect of this debate on my Facebook wall. I shared the viewpoint of Dr. Ed Lazowska (University of Washington) who believes Computer Science to be central to our future. What particularly appealed me was his statement below:

For students who want to change the world, there is no field with greater impact or leverage than computer science. Just take a look at the 2010 report by the President's Council of Advisers on Science and Technology, which characterized computer science as “arguably unique among all fields of science and engineering in the breadth of its impact.


I received a private message from a student who had a disagreement with this view point and he shared Vivek Wadhwa's arguments on the same debate. The student who happens to be an alumnus of FAST-NUCES wanted to know my viewpoint on the famous "tech bubble." The premise behind his argument was that of today's students flocking to Computer Science due to their passion to become the next Zuckerberg, and the driving factor behind a rise in Computer Science grads is gimmicky social media applications which in spite of being a major innovation is a bubble. The premise is no doubt strong; however, the point being missed here is the difference between a scientist's approach and a technologist's approach. Wadhwa lacks the insight necessary to grasp the point being made by Dr. Lazowska which is that Computer Science as a whole new science has the potential to impact almost all other fields of science: it is indispensable for society today.

Wadhwa is an entrepreneur turned academic and this in my opinion may be one of the reasons he fails to grab the essence of Computer Science as a whole. True that a large chunk of today's students run after the sparkling thing called social media but it often happens that their perception of Computer Science changes once they explore the theoretical marvels of this field. A glaring example of this is the Web Science course I am conducting at the Institute of Business Administration - initially students did not understand what the course was about and what they will be learning in it for sadly Web to them means ASP, PHP, HTML and nothing beyond that. Once we began teaching the Web from a scientific perspective students were simply amazed; we are at the point where they think beyond SEO and are well aware of the science behind search engines. The point to be illustrated is that students may not see the real depth in science first but it is not just their fault: those responsible for Science curricula should be doing things the right way and this will definitely create a difference.

Secondly, the point is not about lasting careers or high-paying jobs: it's about making a difference to the world through Computer Science. The point is about pursuing Computer Science because your country needs you and not because you need a mere job! That's what's meant by a "Sputnik moment." Look at the reports that Lazowska links to -- Computer Science is a key to the future due to its vast potential to deliver in areas that matter to our countries such as the health sector, the energy sector, the military surveillance sector and many such others. I can go on and on but what really is disturbing is the naiive approach of our students who have limited life goals and no vision on a broader scale.

Furthermore the examples that Dr. Lazowska quotes are of Noam Chomsky, Watson and Crick. Obviously, these people were not new kids on the block aiming to become the next Zuckerberg, and were not simply running after some social media setup. They were scientists with a vision: a vision to further knowledge so that it serves as a foundation for generations to come. Many of the Google tools you play with and spread on your social networks would not have even existed without Computer Scientists like Dennis Ritchie, Ken Thompson, and Brian Kernighan.

I would love to hear thoughts on this particularly from the Pakistani Computer Science circles be it students or researchers. It is hard to get people in Pakistan engaged in a knowledgeable debate and this is true even for people who have done their PhD's or PostDocs, but it's always worth a try. So feel free to add your viewpoint in the comments section.

Tuesday, May 3, 2011

Interacting with Pakistani Students: Some Tips for Taking Up a Research Career

Almost every week or two I receive emails from students around the world requiring help in their research work and tips on getting into a research career. However, there is a marked difference in the emails that I receive from Pakistani students and the ones that I receive from students in other parts of the world. European students in particular are normally requesting for my Master's thesis or papers and are at times asking questions about the techniques we use in our papers. Similarly students from Korea, China, Hong Kong, Taiwan, Egypt, and Malaysia ask brilliant questions with respect to research and are more focused towards a specific topic, in fact they even suggest some novel aspects into already existing work including pointers for some useful technique we can incorporate in our work. In short they have already identified a research path for themselves and work towards that research path with their questions aimed at getting guidelines towards their chosen topic. On the other hand, most of the students from Pakistan have this single question: please suggest me some research topic or research idea?

Today I feel the urge to write to specially address this question by Pakistani students as I feel this issue has to be taken up carefully. My advice for such students is very simple: no one can tell you a research topic of your interest. I am sure Pakistani CS students would find this answer slightly confusing so I will elaborate further. Just like nobody can tell you what is your favorite food, similarly no one can tell you what area of research you should pick for that is completely dependent upon your likes and dislikes. The fundamental problem with such a question asked by Pakistani students is that they do not even narrow down the research area/domain within which they want to work and rather put the question up at others that please suggest a research topic for me, it would be understandable if the students at least narrow down research area in which they wish to work. Dear students, please remember one thing: if you would be told your research area by someone else although you may be able to finish up the task at hand but you will never be able to realize the passion that is needed in research, you will never enjoy your research and research without enjoyment can never attain fruitful results.

"If you fancy a career as a researcher, you'll spend tens of thousands of hours on work over the next 10 years. The only way you're ever gonna spend 10,000 hours on research is only when you truly deeply love it. If something really engages you and makes you happy, then you will put in the kind of energy and time necessary to become an expert at it." - Click for Source

This is not to blame or strongly condemn the students. In fact my point is to convey what mistake our students do and I do not blame them for this state of affairs. In a country where education is more of a corporate business, and where in particular Computer Science education is hijacked by technologists who know nothing about science and where Professors do not know international standards of research and are not even aware of the best academic conferences of their field such a confusion among students is bound to exist. The problem is clearly lack of guidance for the students and not many people wish to do anything about it, in fact there are some "technology experts" who are even cashing on this "lack of guidance" for their own fame and publicity. In fact the state is so pathetic that our students do not even know what a research paper is let alone reading one, and hence they fail to grab the whole point of scientific research. When a student has no idea where to begin how can he/she get any idea about a research topic.

Here I list down some tips on basis of my research experience, these are specifically for such students who wish to do research but have no clear idea of how to carry on.

1. Narrow down your research area: if you do not know which research areas exist within the broad field of Computer Science then no worries: simply visit the web site of Computer Science departments of famous research universities such as MIT, Stanford, Berkeley, CMU, Cambridge, Oxford, CUHK, ANU, KAIST etc. and browse to their research sections where you will find many research areas listed. Do not just get fascinated by the name of a particular research area, read more about it and then make your decision on whether the area interests you or not.

2a. After step 1 i.e. identification of your research area find out the conferences/journals that are well-known for that particular area. This task will also not be hard, use DBLP for that purpose which is a Computer Science bibliography web site listing all reputed conferences and journals: the name of the conference/journal will pretty much tell you whether it's for the field you have identified or not.
2b. In addition to step 2a one more step is to google out names of famous research groups working in your identified research areas, for instance if the field you have narrowed down is Social Computing then simply search for "Research Groups Social Computing" and then browse the works of the well-known groups of that domain.

3. After listing down conferences and journals within the research area of your interest, read the most popular and latest papers of those conferences. For example any one interested in distributed systems would immediately discover Google's MapReduce paper as the de-facto distributed computing standard and should read that. Another significant factor to look for is the citations the paper has received, read the most cited papers first to get a grip on the topic, Google Scholar will help you in finding number of citations for a paper.

After having read 20-30 papers you will definitely come up with a crude idea and refinement of that idea will of course require discussions with your advisor/seniors researchers, in fact you can even email the authors of some of the papers you read. Researchers love to share and increase knowledge for that is the whole point of research: unlike the corporate, commercial world the research world does not like to hide for it is all about knowledge-sharing and a researcher who does not share his/her knowledge is never looked upon with respect.

Another handy and useful tool that can help immensely in research is Twitter, although it's known as a social networking or micro-blogging service yet it is known as the new journal archive by many in the scientific community. Some of the groups you identify within your areas of interest would be active on Twittter and you can follow them there for updates, for their latest works, and many a times for useful reading material that can help you a lot in your research. But a note of caution: don't bother them with silly questions like please tell me a topic of research, they are quite mature researchers with top-quality students and when anyone would come up to them with such questions they will consider that student as an alien and this is where you have to be extra-cautious.

Feel free to email me with any questions, and I will be glad to help. Please remember that a research career on the surface seems to be attractive but it requires extra hard-work than you would normally have to do in the software house or technology culture of Pakistan because there are no ready-made sweets in research: crafting and scientific knowledge discovery is what you would have to master which of course requires years and years of efforts.

Sunday, March 27, 2011

Invited Talk in Research Universities of Malaysia: The Web Goes Social

As I write this blog post, I am packing stuff and getting ready to leave Malaysia where I came for invited research talks in 3 best schools, namely UTM, MMU and UPM. It was indeed a very hectic but fun-filled trip nevertheless with lots of interaction with faculty and students. Many of the faculty members at Malaysian universities gave an overwhelming response, and inquired about dimensions for research using social media. It was a great learning experience for me to answer their questions, and also to share mutual research discussions that can prove fruitful for joint research collaborations in future.

At UTM I presented my work on Web crawlers done in the Database and Multimedia lab of KAIST as part of my Master's thesis, while at the other two universities I presented my work on Social Web Mining which has been done in collaboration with different institutes of Pakistan.

As many researchers know that social media is not just a toy for masses but also a pool of Web data where Web mining researchers can have lots and lots of data to further their research in various dimensions. The dimension I took up in the talks was related to my research area "Web Search" and it focused on what social media tools have to offer to search engines and how social media along with its vast pool of data serve as an effective enhancement tool for search engines. Not only that I talked about how my research (in particular on Twitter) can provide an effective news monitoring platform that can be useful for media outlets, journalists, political organizations and even governments. I then focused on two works by my research group in this dimension: 1) blogosphere clustering, and 2) Twitter as real-time news analysis service.

Slides are attached here, also I managed to make a video during the last talk at UPM (Universiti Putra Malaysia), and those interested can listen to the talk along with viewing the slides. As before, interested students/researchers may contact me personally if they wish to work in any of these areas as my research group is now actively seeking for students to work on research publications of this domain.




The talk is in four parts which I have included in this post:











Sunday, March 20, 2011

Good Bye Korea: Memories of Database and Multimedia Lab of KAIST

Tonight happens to be my second last night in Korea and Korea for me was for the most part limited to my lab which is the Database and Multimedia Lab of KAIST. Even tonight as I am about to leave the land of Kimchi I am working in the lab doing some final experiments as part of my Web crawling paper. I still have lots and lots of work remaining, cleaning the home, packing many small things and yet here I am in the lab doing final jobs for my paper.

It is said that the path of a graduate student and that too in a subject as innovative as Computer Science is really hectic and requires a lot of patience. This was practically experienced during my Master's degree at KAIST. Though it's been a journey of immense stress and pressure, yet it has been enjoyable and fun all along. In particular life in the Database and Multimedia lab has a culture of its own. Even in the "kaali raats" (the term used by Database and Multimedia Lab members to refer to a night which we completely spend in the lab), we had fun and I will surely miss each and every member of this family of mine. I may or may not come back to Korea, that is still undecided but with this video I wish to thank my family at KAIST. We have had some extreme tough times but we have been a family, times spent in this lab are one of the most precious memories for me where I learned a lot both from point of view of scientific research and enjoying work.





Friday, January 7, 2011

Co-relating News and Tweets: "Ins and Outs of News Twitter as a Real-Time News Analysis Service"

The Web has seen a massive transformation with its read-only nature diminishing more and more and evolving into a read-write nature. Social networks are one of the driving forces behind this transformation and hence, the Social Web can be seen as a fundamental source of more and more UGC (user-generated content).

The phenomenon of "UGC" has also had a significant impact in the domain of Web Search, which happens to be my area of research: a study conducted in 2010 puts Facebook ahead of Google in terms of Web site hits. Many of the major search engine companies such as Google, Yahoo and Bing are now looking at means to take into account the Social Web into their search results. The WWW 2010 paper titled "Anatomy of a Large-Scale Social Search Engine" describes the phenomenon in considerable detail and I recommend it as a must-read to those interested in the field. In fact the team behind this paper created a social search system Aardvark that has now been acquired by Google.

Despite the tremendous amount of importance and attention being given to the concept of social search, one significant domain within this area has not yet been explored much which this recent paper by me and my research group attempts to explore. In this paper we present a system which aims to identify and detect hot news items in real time by taking into account user popularity and temporal features. We present a prototype of the approach using the popular microblogging service "Twitter" and present the results of some initial evaluations of our approach.

The proposed system analyzes real-time news by using the data from Twitter. We give a description of news services, followed by an architecture of how one can assess news popularity. The architecture is built upon a Web crawling framework and a news parser followed by application of natural language processing techniques on the news data which is then finally linked with the Twitter Search API. At the user interface end, we use a simple timeline-based visualization to showcase the popularity of news across time. Furthermore, data from the popular news service Dawn.com over a period of 10 days was crawled on a daily basis and analyzed for co-relation with tweets, this analysis reveals interesting results such as the news bias exhibited by news services. Below is the paper, which can be downloaded as well.




The paper will be published in the proceedings of the workshop "Visual Interfaces to the Social and Semantic Web (VISSW 2011)" co-located with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University in February, 2011. I am sharing it over my blog on request of some students who have shown a lot of interest in the field. For further details/questions/feedback a personal email to arjumand_younus@yahoo.com would be preferred. Also, students willing to work with our research group in this dimension may contact me in person. I will also be uploading the slides and talk for this paper soon.

Thursday, December 30, 2010

Master's Thesis: Design and Implementation of a Scalable High-Speed Parallel Web Crawler

I have been planning to share this for quite sometime, and today finally I managed the time to do so. My Master's thesis covers a very fundamental component of search engines, namely Web crawlers. The research focus of my work is crawler efficiency which is related with scalability and speed of a Web crawler.

The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.

Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.

The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.

Tuesday, December 7, 2010

When There's Nothing to Eat in Korea There's Always Kimbab

Back in Pakistan when this situation arose of no time to cook anything, we had the option of Maggi noodles. But in Korea almost all noodles have some sort of pork (or pork ingredients) so is there no option of having some 2-minute thing? Nopes, there is the all-time Korean favorite Kimbab. Bab in Korean means cooked rice, so Pakistanis describe it as rice burger :), quite unusual and strange, right.

These days my husband and I are under the Masters thesis defense phase so most of the time we face this situation of no time but wanting something quick to eat so Kimbab comes to the rescue.

Gimbap or kimbap is a popular Korean dish made from steamed white rice (bap) and various other ingredients, rolled in gim (sheets of dried laver seaweed) and served in bite-size slices. Gimbap is often eaten during picnics or outdoor events, or as a light lunch, served with danmuji or kimchi. It is similar to the better-known Japanese sushi. We eat the Yaachae (vegetable) or chamchi (tuna) one as all the others have meat. Here are images of two variations of Kimbab. The first one is served in restaurants with a very delicious soya soup that is excellent for winters.
The second one is found in stores packed like a sandwich and is really cheap.






So now I am off to buy my Kimbab from the KAIST cafeteria store :)

Wednesday, November 17, 2010

South Korea: the Pioneer of Social Networks

Social networks have considerably revolutionized the Internet usage patterns. Some studies have estimated that users now spend more time on social networks than on search engines. The most popular social network services include Twitter and Facebook.

Today I will shed some light on use of social networks in South Korea and their role in what can be termed as the social network revolution.

To the surprise of many readers of my blog South Korea is the pioneer of social networks, although you cannot find widespread use of Facebook in South Korea. In South Korea Twitter and Facebook are dwarfed by the local social networks, the most popular of which is CyWorld by SK Communications. Cyworld social network is used by 78 percent of Koreans with Internet access. The Facebook-style service was a pioneer in social networking, launching way back in 1999 and launching as a mobile service in 2004. SK’s instant messaging service – known as NateOn – is also the most widely used in Korea, and is three times more popular than Microsoft’s Live Messenger. It is a popular myth especially amongst the Pakistanis that the credit for making social networks popular goes to Mark Zuckerberg and the likes whereas the real pioneer is South Korea. Furthermore, South Korea's CyWorld was so popular and unique that its US counterpart by name of CyWorld US was launched in 2006. In Korea social networks are not just a social gathering point; in fact they are a very successful business model and as per my Korean lab mates social networks have been known to the Koreans back from the age of the dial-up Internet.

Korea's example according to me is an example that we Pakistanis can learn from. Back in Pakistan when Facebook was banned after the "Draw Muhammad" fan page controversy some people argued that creation of a local social network is not a workable solution since we need to communicate internationally - that is an argument that in my opinion comes from defeated minds and without knowledge of the science of social networks and their evolution patterns - CyWorld is a forum which international community has joined (yes many foreigners joined CyWorld to interact with Koreans) and learned from leading Korea to earn a respectable place in the world of SNS and mobile services with all this causing international communications to increase. Moreover, the data users share on social networks is a critical asset for any country and should not become a commodity for any other country which is currently the case due to not having any idea of the significance of local social networks.

For all the above-mentioned reasons Korea's case is one that has important lessons for many countries in Asia and in particular Pakistan where anything foreign and in particular by US is considered irreplaceable whereas many other countries of the world are moving in a different direction.

Tuesday, September 14, 2010

Experience at TEDxKAIST: Happiness for Science and Science for Happiness

Almost all of us know about the TED platform which stands for Technology, Entertainment, Design and it is an annual event where world's leading thinkers and doers are invited to share what they are most passionate about. The TED platform gave birth to an accompanying, new initiative called TEDx and this is a new program that enables local communities such as schools, businesses, libraries, neighborhoods or just groups of friends to organize, design and host their own independent TED-like events. Such an event was organized in KAIST on 11th September, 2010 by some students of KAIST and the team was wonderfully led by Mark Whiting, a Masters student at the Department of Industrial Design, KAIST. They called it TEDxKAIST and it turned out to be an energizing event for the hard-working KAIST students.


KAIST is one of the nation's most prestigious science and technology institutions and keeping this in mind the theme was well-thought: "Happiness for Science and Science for Happiness" - its a significant one as KAIST is all about hard-working students actively engaged in scientific contributions and advancement. There is a famous saying for KAIST students that KAIST never sleeps - the KAISTIANs struggle hard to survive in the seeming paradox of hard work and true happiness. The inspiration for the theme came from this excerpt:

"Of the more highly educated sections of the community, the happiest in the present day are the men of science. Many of the most eminent of them are are emotionally simple, and obtain from their work a satisfaction so profound that they can derive pleasure from eating and even marrying." - Bertrand Russell (1930); The Conquest of Happiness


In this blog post I will share my take aways from TEDxKAIST and some of the key points from the speeches that all people associated with science should keep in mind to become a "happy scientist contributing to a happy world." Following is a brief profile of each of the speakers who spoke in TEDxKAIST and gave their view of a happy scientist:
  • Dr. Young Hae Noh who is a Professor at School of Humanities and Social Sciences at KAIST and has also served as dean of multiple departments at KAIST.
  • Dr. Minhwa Lee is the Business Ombudsman who makes a link between government and the small and medium businesses; he is also a Professor at KAIST.
  • Spanish Koffee, a very famous music group in Korea which pursues free distribution of digital music in their mission of "Passion worth Spreading."
  • Dr. Woonseung Yeo is a Professor at Graduate School of Culture Technology at KAIST and his PhD work at Stanford university includes introduction to the field of sonification which implies transmitting information through audio signals.
  • Sungdong Park is the CEO of Satrec Initiative which is the world's leading company in high-performance, cost-effective Earth observation small satellite solutions. He won a Civil Merit Medal, a presidential commandment and an Industrial Service Medal for his contributions to Korean space science and technology.
  • Byungwoo Jang is the CEO of LG OTIS and has served LG for many years. He comes from a family of great scholars of English literature.
  • Dr. Don Norman is a distinguished visiting Professor at KAIST and holds many other significant positions around the world. His work has resulted in a number of influential books including “The Design of Everyday Things” and most recently “Living With Complexity.”
Professor Noh began her speech with a quote on definition of success by Benjamin Zander, "Success is not about wealth, fame or power; it's about how many shining eyes I have found." She shared her story about her musical classes - a love story but a very different one: a Professor-student love story. She shared her tips on being a successful Professor - a Professor that brings out the talent in her students to the full, that is both loved by the students and loves the students and a Professor that incites passion and enthusiasm in the students which in my opinion is quite lacking in a majority of today's students. She advocated the idea that Professors should give freedom to students by allowing them to discover their potential and greatness in a journey of their own and at the same time Professors should be keen observers of students and should extract joy in discovering interesting features of their students.


It was really interesting to see and actually observe the scientist's definition of happiness: surpassing challenges and overcoming obstacles; sharing and inculcating passion all around is what happiness is from a scientist's point of view and this view came out more clearly in the talk by Sungdong Park. His story was one of courage and bravery, of making the impossible possible despite all hardships and of rising after setbacks. He shared a newspaper cutting which said, ""First Park Sung Dong got mad. Then he got even." Before establishing Satrec Initiative he was the leader in developing advanced small satellites in KAIST for 10 years - but then something happened which eventually led him to the success he enjoys today but the path was not easy. His government lab was laid off; it was a hard time but he did not lose hope and launched a venture with his old lab's technology. His vision was to make all of SATREC's engineers become millionaires - apparently a crazy idea but with passion and devotion Park made this possible and today Satrec is the only private company in Korea that is a member of the International Astronautical Federation (IAF) and is deploying satellite solutions for Dubai, Malaysia, Singapore and Turkey.

Another talk that inspired me a lot and in which were the things I have always advocated for science and engineering students was the talk of Byungwoo Jan, the main theme of the talk being technology needs art. His talk was about importance of literature for science and engineering students - without literature any student is incomplete for literature is a way to imagine yourself in the position of another person. Today there is lack of feeling of the pain of others which is making the world an insensitive place - one way to overcome this is through literature. The LG OTIS CEO highlighted how reading books makes life more meaningful and transforms individuals - many successful people have literature behind their back. Thomas Edison is reported to have read 3.5 million pages a life and think of all the imagination and creativity he derived from all these books. Abraham Lincoln had an unfortunate childhood, his life was transformed completely after reading the biography of President Washington and he decided to become a President. Reading books and works of literature that today's students of science and engineering do not do nor enjoy much is a very healthy habit for the mind and can be a new source of creativity and inspiration for tomorrow's scientists so they must not give up this habit.

At the end was the talk of Professor Don Norman which was undoubtedly the highlight of the entire event. The thing that was really surprising about this talk was that he did not use any slides, instead he drew all the material he wanted to present on a white board and the talk was inspiring indeed with lots and lots of lessons for people of science and engineering. The talk was fundamentally organized around the following

He first asked the audience about the ones who were happy and ones who were not and then moved on to say that those that said neither happy nor unhappy made a smart choice - because if you're happy then it means you are not doing well in your pursuit in life because on every path happiness comes with a lot and lot of unhappiness; being successful means not going through the normal way but through lots and lots of pain and difficulty. He then explained further about the happiness and sadness - it is just a state which can be measured and when on the path of achieving something one should not worry about being happy; satisfaction and dissatisfaction - it is a judgment which no one can measure except a person himself/herself and optimism and pessimism - these are points of views and this is what determines everything. As an example on point of view he explained the fear that a human feels when asked to walk on a plank placed in mid-air as opposed to no fear when he is asked to walk on the same plank placed on the floor meaning that points of views are driven by a human's emotional system, his approach and instincts and this has to be the driving factor if a scientist is to derive happiness from his science - happiness for both himself and the world.

He related a story about his experience at Apple which shows how a fusion of happiness and anxiety can lead to success in science - his tip was that when thinking about new ideas and when embarking on journey to creativity one must have fun, relax, be in a comfortable state of mind but when decision has been taken on some idea then accomplishment comes through anxiety and a worried state of mind. Lastly his talk threw light on the paradox of urgent problems vs. the important problems - it is the important problems that need to get done first because what you want to do in life is the important thing and that makes the difference.

This event was a great experience and a memorable one during my stay at KAIST and surely the lessons and tips given here will help me throughout my academic life.