Tuesday, May 3, 2011

Interacting with Pakistani Students: Some Tips for Taking Up a Research Career

Almost every week or two I receive emails from students around the world requiring help in their research work and tips on getting into a research career. However, there is a marked difference in the emails that I receive from Pakistani students and the ones that I receive from students in other parts of the world. European students in particular are normally requesting for my Master's thesis or papers and are at times asking questions about the techniques we use in our papers. Similarly students from Korea, China, Hong Kong, Taiwan, Egypt, and Malaysia ask brilliant questions with respect to research and are more focused towards a specific topic, in fact they even suggest some novel aspects into already existing work including pointers for some useful technique we can incorporate in our work. In short they have already identified a research path for themselves and work towards that research path with their questions aimed at getting guidelines towards their chosen topic. On the other hand, most of the students from Pakistan have this single question: please suggest me some research topic or research idea?

Today I feel the urge to write to specially address this question by Pakistani students as I feel this issue has to be taken up carefully. My advice for such students is very simple: no one can tell you a research topic of your interest. I am sure Pakistani CS students would find this answer slightly confusing so I will elaborate further. Just like nobody can tell you what is your favorite food, similarly no one can tell you what area of research you should pick for that is completely dependent upon your likes and dislikes. The fundamental problem with such a question asked by Pakistani students is that they do not even narrow down the research area/domain within which they want to work and rather put the question up at others that please suggest a research topic for me, it would be understandable if the students at least narrow down research area in which they wish to work. Dear students, please remember one thing: if you would be told your research area by someone else although you may be able to finish up the task at hand but you will never be able to realize the passion that is needed in research, you will never enjoy your research and research without enjoyment can never attain fruitful results.

"If you fancy a career as a researcher, you'll spend tens of thousands of hours on work over the next 10 years. The only way you're ever gonna spend 10,000 hours on research is only when you truly deeply love it. If something really engages you and makes you happy, then you will put in the kind of energy and time necessary to become an expert at it." - Click for Source

This is not to blame or strongly condemn the students. In fact my point is to convey what mistake our students do and I do not blame them for this state of affairs. In a country where education is more of a corporate business, and where in particular Computer Science education is hijacked by technologists who know nothing about science and where Professors do not know international standards of research and are not even aware of the best academic conferences of their field such a confusion among students is bound to exist. The problem is clearly lack of guidance for the students and not many people wish to do anything about it, in fact there are some "technology experts" who are even cashing on this "lack of guidance" for their own fame and publicity. In fact the state is so pathetic that our students do not even know what a research paper is let alone reading one, and hence they fail to grab the whole point of scientific research. When a student has no idea where to begin how can he/she get any idea about a research topic.

Here I list down some tips on basis of my research experience, these are specifically for such students who wish to do research but have no clear idea of how to carry on.

1. Narrow down your research area: if you do not know which research areas exist within the broad field of Computer Science then no worries: simply visit the web site of Computer Science departments of famous research universities such as MIT, Stanford, Berkeley, CMU, Cambridge, Oxford, CUHK, ANU, KAIST etc. and browse to their research sections where you will find many research areas listed. Do not just get fascinated by the name of a particular research area, read more about it and then make your decision on whether the area interests you or not.

2a. After step 1 i.e. identification of your research area find out the conferences/journals that are well-known for that particular area. This task will also not be hard, use DBLP for that purpose which is a Computer Science bibliography web site listing all reputed conferences and journals: the name of the conference/journal will pretty much tell you whether it's for the field you have identified or not.
2b. In addition to step 2a one more step is to google out names of famous research groups working in your identified research areas, for instance if the field you have narrowed down is Social Computing then simply search for "Research Groups Social Computing" and then browse the works of the well-known groups of that domain.

3. After listing down conferences and journals within the research area of your interest, read the most popular and latest papers of those conferences. For example any one interested in distributed systems would immediately discover Google's MapReduce paper as the de-facto distributed computing standard and should read that. Another significant factor to look for is the citations the paper has received, read the most cited papers first to get a grip on the topic, Google Scholar will help you in finding number of citations for a paper.

After having read 20-30 papers you will definitely come up with a crude idea and refinement of that idea will of course require discussions with your advisor/seniors researchers, in fact you can even email the authors of some of the papers you read. Researchers love to share and increase knowledge for that is the whole point of research: unlike the corporate, commercial world the research world does not like to hide for it is all about knowledge-sharing and a researcher who does not share his/her knowledge is never looked upon with respect.

Another handy and useful tool that can help immensely in research is Twitter, although it's known as a social networking or micro-blogging service yet it is known as the new journal archive by many in the scientific community. Some of the groups you identify within your areas of interest would be active on Twittter and you can follow them there for updates, for their latest works, and many a times for useful reading material that can help you a lot in your research. But a note of caution: don't bother them with silly questions like please tell me a topic of research, they are quite mature researchers with top-quality students and when anyone would come up to them with such questions they will consider that student as an alien and this is where you have to be extra-cautious.

Feel free to email me with any questions, and I will be glad to help. Please remember that a research career on the surface seems to be attractive but it requires extra hard-work than you would normally have to do in the software house or technology culture of Pakistan because there are no ready-made sweets in research: crafting and scientific knowledge discovery is what you would have to master which of course requires years and years of efforts.

Sunday, March 27, 2011

Invited Talk in Research Universities of Malaysia: The Web Goes Social

As I write this blog post, I am packing stuff and getting ready to leave Malaysia where I came for invited research talks in 3 best schools, namely UTM, MMU and UPM. It was indeed a very hectic but fun-filled trip nevertheless with lots of interaction with faculty and students. Many of the faculty members at Malaysian universities gave an overwhelming response, and inquired about dimensions for research using social media. It was a great learning experience for me to answer their questions, and also to share mutual research discussions that can prove fruitful for joint research collaborations in future.

At UTM I presented my work on Web crawlers done in the Database and Multimedia lab of KAIST as part of my Master's thesis, while at the other two universities I presented my work on Social Web Mining which has been done in collaboration with different institutes of Pakistan.

As many researchers know that social media is not just a toy for masses but also a pool of Web data where Web mining researchers can have lots and lots of data to further their research in various dimensions. The dimension I took up in the talks was related to my research area "Web Search" and it focused on what social media tools have to offer to search engines and how social media along with its vast pool of data serve as an effective enhancement tool for search engines. Not only that I talked about how my research (in particular on Twitter) can provide an effective news monitoring platform that can be useful for media outlets, journalists, political organizations and even governments. I then focused on two works by my research group in this dimension: 1) blogosphere clustering, and 2) Twitter as real-time news analysis service.

Slides are attached here, also I managed to make a video during the last talk at UPM (Universiti Putra Malaysia), and those interested can listen to the talk along with viewing the slides. As before, interested students/researchers may contact me personally if they wish to work in any of these areas as my research group is now actively seeking for students to work on research publications of this domain.




The talk is in four parts which I have included in this post:











Sunday, March 20, 2011

Good Bye Korea: Memories of Database and Multimedia Lab of KAIST

Tonight happens to be my second last night in Korea and Korea for me was for the most part limited to my lab which is the Database and Multimedia Lab of KAIST. Even tonight as I am about to leave the land of Kimchi I am working in the lab doing some final experiments as part of my Web crawling paper. I still have lots and lots of work remaining, cleaning the home, packing many small things and yet here I am in the lab doing final jobs for my paper.

It is said that the path of a graduate student and that too in a subject as innovative as Computer Science is really hectic and requires a lot of patience. This was practically experienced during my Master's degree at KAIST. Though it's been a journey of immense stress and pressure, yet it has been enjoyable and fun all along. In particular life in the Database and Multimedia lab has a culture of its own. Even in the "kaali raats" (the term used by Database and Multimedia Lab members to refer to a night which we completely spend in the lab), we had fun and I will surely miss each and every member of this family of mine. I may or may not come back to Korea, that is still undecided but with this video I wish to thank my family at KAIST. We have had some extreme tough times but we have been a family, times spent in this lab are one of the most precious memories for me where I learned a lot both from point of view of scientific research and enjoying work.





Friday, January 7, 2011

Co-relating News and Tweets: "Ins and Outs of News Twitter as a Real-Time News Analysis Service"

The Web has seen a massive transformation with its read-only nature diminishing more and more and evolving into a read-write nature. Social networks are one of the driving forces behind this transformation and hence, the Social Web can be seen as a fundamental source of more and more UGC (user-generated content).

The phenomenon of "UGC" has also had a significant impact in the domain of Web Search, which happens to be my area of research: a study conducted in 2010 puts Facebook ahead of Google in terms of Web site hits. Many of the major search engine companies such as Google, Yahoo and Bing are now looking at means to take into account the Social Web into their search results. The WWW 2010 paper titled "Anatomy of a Large-Scale Social Search Engine" describes the phenomenon in considerable detail and I recommend it as a must-read to those interested in the field. In fact the team behind this paper created a social search system Aardvark that has now been acquired by Google.

Despite the tremendous amount of importance and attention being given to the concept of social search, one significant domain within this area has not yet been explored much which this recent paper by me and my research group attempts to explore. In this paper we present a system which aims to identify and detect hot news items in real time by taking into account user popularity and temporal features. We present a prototype of the approach using the popular microblogging service "Twitter" and present the results of some initial evaluations of our approach.

The proposed system analyzes real-time news by using the data from Twitter. We give a description of news services, followed by an architecture of how one can assess news popularity. The architecture is built upon a Web crawling framework and a news parser followed by application of natural language processing techniques on the news data which is then finally linked with the Twitter Search API. At the user interface end, we use a simple timeline-based visualization to showcase the popularity of news across time. Furthermore, data from the popular news service Dawn.com over a period of 10 days was crawled on a daily basis and analyzed for co-relation with tweets, this analysis reveals interesting results such as the news bias exhibited by news services. Below is the paper, which can be downloaded as well.




The paper will be published in the proceedings of the workshop "Visual Interfaces to the Social and Semantic Web (VISSW 2011)" co-located with International Conference on Intelligent User Interfaces (IUI 2011) to be held at Stanford University in February, 2011. I am sharing it over my blog on request of some students who have shown a lot of interest in the field. For further details/questions/feedback a personal email to arjumand_younus@yahoo.com would be preferred. Also, students willing to work with our research group in this dimension may contact me in person. I will also be uploading the slides and talk for this paper soon.

Thursday, December 30, 2010

Master's Thesis: Design and Implementation of a Scalable High-Speed Parallel Web Crawler

I have been planning to share this for quite sometime, and today finally I managed the time to do so. My Master's thesis covers a very fundamental component of search engines, namely Web crawlers. The research focus of my work is crawler efficiency which is related with scalability and speed of a Web crawler.

The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.

Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.

The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.

Tuesday, December 7, 2010

When There's Nothing to Eat in Korea There's Always Kimbab

Back in Pakistan when this situation arose of no time to cook anything, we had the option of Maggi noodles. But in Korea almost all noodles have some sort of pork (or pork ingredients) so is there no option of having some 2-minute thing? Nopes, there is the all-time Korean favorite Kimbab. Bab in Korean means cooked rice, so Pakistanis describe it as rice burger :), quite unusual and strange, right.

These days my husband and I are under the Masters thesis defense phase so most of the time we face this situation of no time but wanting something quick to eat so Kimbab comes to the rescue.

Gimbap or kimbap is a popular Korean dish made from steamed white rice (bap) and various other ingredients, rolled in gim (sheets of dried laver seaweed) and served in bite-size slices. Gimbap is often eaten during picnics or outdoor events, or as a light lunch, served with danmuji or kimchi. It is similar to the better-known Japanese sushi. We eat the Yaachae (vegetable) or chamchi (tuna) one as all the others have meat. Here are images of two variations of Kimbab. The first one is served in restaurants with a very delicious soya soup that is excellent for winters.
The second one is found in stores packed like a sandwich and is really cheap.






So now I am off to buy my Kimbab from the KAIST cafeteria store :)

Wednesday, November 17, 2010

South Korea: the Pioneer of Social Networks

Social networks have considerably revolutionized the Internet usage patterns. Some studies have estimated that users now spend more time on social networks than on search engines. The most popular social network services include Twitter and Facebook.

Today I will shed some light on use of social networks in South Korea and their role in what can be termed as the social network revolution.

To the surprise of many readers of my blog South Korea is the pioneer of social networks, although you cannot find widespread use of Facebook in South Korea. In South Korea Twitter and Facebook are dwarfed by the local social networks, the most popular of which is CyWorld by SK Communications. Cyworld social network is used by 78 percent of Koreans with Internet access. The Facebook-style service was a pioneer in social networking, launching way back in 1999 and launching as a mobile service in 2004. SK’s instant messaging service – known as NateOn – is also the most widely used in Korea, and is three times more popular than Microsoft’s Live Messenger. It is a popular myth especially amongst the Pakistanis that the credit for making social networks popular goes to Mark Zuckerberg and the likes whereas the real pioneer is South Korea. Furthermore, South Korea's CyWorld was so popular and unique that its US counterpart by name of CyWorld US was launched in 2006. In Korea social networks are not just a social gathering point; in fact they are a very successful business model and as per my Korean lab mates social networks have been known to the Koreans back from the age of the dial-up Internet.

Korea's example according to me is an example that we Pakistanis can learn from. Back in Pakistan when Facebook was banned after the "Draw Muhammad" fan page controversy some people argued that creation of a local social network is not a workable solution since we need to communicate internationally - that is an argument that in my opinion comes from defeated minds and without knowledge of the science of social networks and their evolution patterns - CyWorld is a forum which international community has joined (yes many foreigners joined CyWorld to interact with Koreans) and learned from leading Korea to earn a respectable place in the world of SNS and mobile services with all this causing international communications to increase. Moreover, the data users share on social networks is a critical asset for any country and should not become a commodity for any other country which is currently the case due to not having any idea of the significance of local social networks.

For all the above-mentioned reasons Korea's case is one that has important lessons for many countries in Asia and in particular Pakistan where anything foreign and in particular by US is considered irreplaceable whereas many other countries of the world are moving in a different direction.