Sunday, March 27, 2011

Invited Talk in Research Universities of Malaysia: The Web Goes Social

As I write this blog post, I am packing stuff and getting ready to leave Malaysia where I came for invited research talks in 3 best schools, namely UTM, MMU and UPM. It was indeed a very hectic but fun-filled trip nevertheless with lots of interaction with faculty and students. Many of the faculty members at Malaysian universities gave an overwhelming response, and inquired about dimensions for research using social media. It was a great learning experience for me to answer their questions, and also to share mutual research discussions that can prove fruitful for joint research collaborations in future.

At UTM I presented my work on Web crawlers done in the Database and Multimedia lab of KAIST as part of my Master's thesis, while at the other two universities I presented my work on Social Web Mining which has been done in collaboration with different institutes of Pakistan.

As many researchers know that social media is not just a toy for masses but also a pool of Web data where Web mining researchers can have lots and lots of data to further their research in various dimensions. The dimension I took up in the talks was related to my research area "Web Search" and it focused on what social media tools have to offer to search engines and how social media along with its vast pool of data serve as an effective enhancement tool for search engines. Not only that I talked about how my research (in particular on Twitter) can provide an effective news monitoring platform that can be useful for media outlets, journalists, political organizations and even governments. I then focused on two works by my research group in this dimension: 1) blogosphere clustering, and 2) Twitter as real-time news analysis service.

Slides are attached here, also I managed to make a video during the last talk at UPM (Universiti Putra Malaysia), and those interested can listen to the talk along with viewing the slides. As before, interested students/researchers may contact me personally if they wish to work in any of these areas as my research group is now actively seeking for students to work on research publications of this domain.




The talk is in four parts which I have included in this post:











Sunday, March 20, 2011

Good Bye Korea: Memories of Database and Multimedia Lab of KAIST

Tonight happens to be my second last night in Korea and Korea for me was for the most part limited to my lab which is the Database and Multimedia Lab of KAIST. Even tonight as I am about to leave the land of Kimchi I am working in the lab doing some final experiments as part of my Web crawling paper. I still have lots and lots of work remaining, cleaning the home, packing many small things and yet here I am in the lab doing final jobs for my paper.

It is said that the path of a graduate student and that too in a subject as innovative as Computer Science is really hectic and requires a lot of patience. This was practically experienced during my Master's degree at KAIST. Though it's been a journey of immense stress and pressure, yet it has been enjoyable and fun all along. In particular life in the Database and Multimedia lab has a culture of its own. Even in the "kaali raats" (the term used by Database and Multimedia Lab members to refer to a night which we completely spend in the lab), we had fun and I will surely miss each and every member of this family of mine. I may or may not come back to Korea, that is still undecided but with this video I wish to thank my family at KAIST. We have had some extreme tough times but we have been a family, times spent in this lab are one of the most precious memories for me where I learned a lot both from point of view of scientific research and enjoying work.





Friday, January 7, 2011

Co-relating News and Tweets: "Ins and Outs of News Twitter as a Real-Time News Analysis Service"

The Web has seen a massive transformation with its read-only nature diminishing more and more and evolving into a read-write nature. Social networks are one of the driving forces behind this transformation and hence, the Social Web can be seen as a fundamental source of more and more UGC (user-generated content).

The phenomenon of "UGC" has also had a significant impact in the domain of Web Search, which happens to be my area of research: a study conducted in 2010 puts Facebook ahead of Google in terms of Web site hits. Many of the major search engine companies such as Google, Yahoo and Bing are now looking at means to take into account the Social Web into their search results. The WWW 2010 paper titled "Anatomy of a Large-Scale Social Search Engine" describes the phenomenon in considerable detail and I recommend it as a must-read to those interested in the field. In fact the team behind this paper created a social search system Aardvark that has now been acquired by Google.

Despite the tremendous amount of importance and attention being given to the concept of social search, one significant domain within this area has not yet been explored much which this recent paper by me and my research group attempts to explore. In this paper we present a system which aims to identify and detect hot news items in real time by taking into account user popularity and temporal features. We present a prototype of the approach using the popular microblogging service "Twitter" and present the results of some initial evaluations of our approach.

The proposed system analyzes real-time news by using the data from Twitter. We give a description of news services, followed by an architecture of how one can assess news popularity. The architecture is built upon a Web crawling framework and a news parser followed by application of natural language processing techniques on the news data which is then finally linked with the Twitter Search API. At the user interface end, we use a simple timeline-based visualization to showcase the popularity of news across time. Furthermore, data from the popular news service Dawn.com over a period of 10 days was crawled on a daily basis and analyzed for co-relation with tweets, this analysis reveals interesting results such as the news bias exhibited by news services. Below is the paper, which can be downloaded as well.




The paper will be published in the proceedings of the workshop "Visual Interfaces to the Social and Semantic Web (VISSW 2011)" co-located with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University in February, 2011. I am sharing it over my blog on request of some students who have shown a lot of interest in the field. For further details/questions/feedback a personal email to arjumand_younus@yahoo.com would be preferred. Also, students willing to work with our research group in this dimension may contact me in person. I will also be uploading the slides and talk for this paper soon.

Thursday, December 30, 2010

Master's Thesis: Design and Implementation of a Scalable High-Speed Parallel Web Crawler

I have been planning to share this for quite sometime, and today finally I managed the time to do so. My Master's thesis covers a very fundamental component of search engines, namely Web crawlers. The research focus of my work is crawler efficiency which is related with scalability and speed of a Web crawler.

The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.

Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.

The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.

Tuesday, December 7, 2010

When There's Nothing to Eat in Korea There's Always Kimbab

Back in Pakistan when this situation arose of no time to cook anything, we had the option of Maggi noodles. But in Korea almost all noodles have some sort of pork (or pork ingredients) so is there no option of having some 2-minute thing? Nopes, there is the all-time Korean favorite Kimbab. Bab in Korean means cooked rice, so Pakistanis describe it as rice burger :), quite unusual and strange, right.

These days my husband and I are under the Masters thesis defense phase so most of the time we face this situation of no time but wanting something quick to eat so Kimbab comes to the rescue.

Gimbap or kimbap is a popular Korean dish made from steamed white rice (bap) and various other ingredients, rolled in gim (sheets of dried laver seaweed) and served in bite-size slices. Gimbap is often eaten during picnics or outdoor events, or as a light lunch, served with danmuji or kimchi. It is similar to the better-known Japanese sushi. We eat the Yaachae (vegetable) or chamchi (tuna) one as all the others have meat. Here are images of two variations of Kimbab. The first one is served in restaurants with a very delicious soya soup that is excellent for winters.
The second one is found in stores packed like a sandwich and is really cheap.






So now I am off to buy my Kimbab from the KAIST cafeteria store :)

Wednesday, November 17, 2010

South Korea: the Pioneer of Social Networks

Social networks have considerably revolutionized the Internet usage patterns. Some studies have estimated that users now spend more time on social networks than on search engines. The most popular social network services include Twitter and Facebook.

Today I will shed some light on use of social networks in South Korea and their role in what can be termed as the social network revolution.

To the surprise of many readers of my blog South Korea is the pioneer of social networks, although you cannot find widespread use of Facebook in South Korea. In South Korea Twitter and Facebook are dwarfed by the local social networks, the most popular of which is CyWorld by SK Communications. Cyworld social network is used by 78 percent of Koreans with Internet access. The Facebook-style service was a pioneer in social networking, launching way back in 1999 and launching as a mobile service in 2004. SK’s instant messaging service – known as NateOn – is also the most widely used in Korea, and is three times more popular than Microsoft’s Live Messenger. It is a popular myth especially amongst the Pakistanis that the credit for making social networks popular goes to Mark Zuckerberg and the likes whereas the real pioneer is South Korea. Furthermore, South Korea's CyWorld was so popular and unique that its US counterpart by name of CyWorld US was launched in 2006. In Korea social networks are not just a social gathering point; in fact they are a very successful business model and as per my Korean lab mates social networks have been known to the Koreans back from the age of the dial-up Internet.

Korea's example according to me is an example that we Pakistanis can learn from. Back in Pakistan when Facebook was banned after the "Draw Muhammad" fan page controversy some people argued that creation of a local social network is not a workable solution since we need to communicate internationally - that is an argument that in my opinion comes from defeated minds and without knowledge of the science of social networks and their evolution patterns - CyWorld is a forum which international community has joined (yes many foreigners joined CyWorld to interact with Koreans) and learned from leading Korea to earn a respectable place in the world of SNS and mobile services with all this causing international communications to increase. Moreover, the data users share on social networks is a critical asset for any country and should not become a commodity for any other country which is currently the case due to not having any idea of the significance of local social networks.

For all the above-mentioned reasons Korea's case is one that has important lessons for many countries in Asia and in particular Pakistan where anything foreign and in particular by US is considered irreplaceable whereas many other countries of the world are moving in a different direction.

Tuesday, September 14, 2010

Experience at TEDxKAIST: Happiness for Science and Science for Happiness

Almost all of us know about the TED platform which stands for Technology, Entertainment, Design and it is an annual event where world's leading thinkers and doers are invited to share what they are most passionate about. The TED platform gave birth to an accompanying, new initiative called TEDx and this is a new program that enables local communities such as schools, businesses, libraries, neighborhoods or just groups of friends to organize, design and host their own independent TED-like events. Such an event was organized in KAIST on 11th September, 2010 by some students of KAIST and the team was wonderfully led by Mark Whiting, a Masters student at the Department of Industrial Design, KAIST. They called it TEDxKAIST and it turned out to be an energizing event for the hard-working KAIST students.


KAIST is one of the nation's most prestigious science and technology institutions and keeping this in mind the theme was well-thought: "Happiness for Science and Science for Happiness" - its a significant one as KAIST is all about hard-working students actively engaged in scientific contributions and advancement. There is a famous saying for KAIST students that KAIST never sleeps - the KAISTIANs struggle hard to survive in the seeming paradox of hard work and true happiness. The inspiration for the theme came from this excerpt:

"Of the more highly educated sections of the community, the happiest in the present day are the men of science. Many of the most eminent of them are are emotionally simple, and obtain from their work a satisfaction so profound that they can derive pleasure from eating and even marrying." - Bertrand Russell (1930); The Conquest of Happiness


In this blog post I will share my take aways from TEDxKAIST and some of the key points from the speeches that all people associated with science should keep in mind to become a "happy scientist contributing to a happy world." Following is a brief profile of each of the speakers who spoke in TEDxKAIST and gave their view of a happy scientist:
  • Dr. Young Hae Noh who is a Professor at School of Humanities and Social Sciences at KAIST and has also served as dean of multiple departments at KAIST.
  • Dr. Minhwa Lee is the Business Ombudsman who makes a link between government and the small and medium businesses; he is also a Professor at KAIST.
  • Spanish Koffee, a very famous music group in Korea which pursues free distribution of digital music in their mission of "Passion worth Spreading."
  • Dr. Woonseung Yeo is a Professor at Graduate School of Culture Technology at KAIST and his PhD work at Stanford university includes introduction to the field of sonification which implies transmitting information through audio signals.
  • Sungdong Park is the CEO of Satrec Initiative which is the world's leading company in high-performance, cost-effective Earth observation small satellite solutions. He won a Civil Merit Medal, a presidential commandment and an Industrial Service Medal for his contributions to Korean space science and technology.
  • Byungwoo Jang is the CEO of LG OTIS and has served LG for many years. He comes from a family of great scholars of English literature.
  • Dr. Don Norman is a distinguished visiting Professor at KAIST and holds many other significant positions around the world. His work has resulted in a number of influential books including “The Design of Everyday Things” and most recently “Living With Complexity.”
Professor Noh began her speech with a quote on definition of success by Benjamin Zander, "Success is not about wealth, fame or power; it's about how many shining eyes I have found." She shared her story about her musical classes - a love story but a very different one: a Professor-student love story. She shared her tips on being a successful Professor - a Professor that brings out the talent in her students to the full, that is both loved by the students and loves the students and a Professor that incites passion and enthusiasm in the students which in my opinion is quite lacking in a majority of today's students. She advocated the idea that Professors should give freedom to students by allowing them to discover their potential and greatness in a journey of their own and at the same time Professors should be keen observers of students and should extract joy in discovering interesting features of their students.


It was really interesting to see and actually observe the scientist's definition of happiness: surpassing challenges and overcoming obstacles; sharing and inculcating passion all around is what happiness is from a scientist's point of view and this view came out more clearly in the talk by Sungdong Park. His story was one of courage and bravery, of making the impossible possible despite all hardships and of rising after setbacks. He shared a newspaper cutting which said, ""First Park Sung Dong got mad. Then he got even." Before establishing Satrec Initiative he was the leader in developing advanced small satellites in KAIST for 10 years - but then something happened which eventually led him to the success he enjoys today but the path was not easy. His government lab was laid off; it was a hard time but he did not lose hope and launched a venture with his old lab's technology. His vision was to make all of SATREC's engineers become millionaires - apparently a crazy idea but with passion and devotion Park made this possible and today Satrec is the only private company in Korea that is a member of the International Astronautical Federation (IAF) and is deploying satellite solutions for Dubai, Malaysia, Singapore and Turkey.

Another talk that inspired me a lot and in which were the things I have always advocated for science and engineering students was the talk of Byungwoo Jan, the main theme of the talk being technology needs art. His talk was about importance of literature for science and engineering students - without literature any student is incomplete for literature is a way to imagine yourself in the position of another person. Today there is lack of feeling of the pain of others which is making the world an insensitive place - one way to overcome this is through literature. The LG OTIS CEO highlighted how reading books makes life more meaningful and transforms individuals - many successful people have literature behind their back. Thomas Edison is reported to have read 3.5 million pages a life and think of all the imagination and creativity he derived from all these books. Abraham Lincoln had an unfortunate childhood, his life was transformed completely after reading the biography of President Washington and he decided to become a President. Reading books and works of literature that today's students of science and engineering do not do nor enjoy much is a very healthy habit for the mind and can be a new source of creativity and inspiration for tomorrow's scientists so they must not give up this habit.

At the end was the talk of Professor Don Norman which was undoubtedly the highlight of the entire event. The thing that was really surprising about this talk was that he did not use any slides, instead he drew all the material he wanted to present on a white board and the talk was inspiring indeed with lots and lots of lessons for people of science and engineering. The talk was fundamentally organized around the following

He first asked the audience about the ones who were happy and ones who were not and then moved on to say that those that said neither happy nor unhappy made a smart choice - because if you're happy then it means you are not doing well in your pursuit in life because on every path happiness comes with a lot and lot of unhappiness; being successful means not going through the normal way but through lots and lots of pain and difficulty. He then explained further about the happiness and sadness - it is just a state which can be measured and when on the path of achieving something one should not worry about being happy; satisfaction and dissatisfaction - it is a judgment which no one can measure except a person himself/herself and optimism and pessimism - these are points of views and this is what determines everything. As an example on point of view he explained the fear that a human feels when asked to walk on a plank placed in mid-air as opposed to no fear when he is asked to walk on the same plank placed on the floor meaning that points of views are driven by a human's emotional system, his approach and instincts and this has to be the driving factor if a scientist is to derive happiness from his science - happiness for both himself and the world.

He related a story about his experience at Apple which shows how a fusion of happiness and anxiety can lead to success in science - his tip was that when thinking about new ideas and when embarking on journey to creativity one must have fun, relax, be in a comfortable state of mind but when decision has been taken on some idea then accomplishment comes through anxiety and a worried state of mind. Lastly his talk threw light on the paradox of urgent problems vs. the important problems - it is the important problems that need to get done first because what you want to do in life is the important thing and that makes the difference.

This event was a great experience and a memorable one during my stay at KAIST and surely the lessons and tips given here will help me throughout my academic life.