Wednesday, July 21, 2010

[From SIGIR 2010]: KeyNote on Refactoring the Search Problem

The largest forum for researchers in the Information Retrieval community "ACM SIGIR" is underway in Geneva, Switzerland and it began yesterday. The best thing about social network platforms of today is that even though you are not in the conference, you are up to date with all the talks , the papers and new innovative ideas being presented and thanks to Twitter and blogosphere much of SIGIR 2010 happenings are coming to me straight and live :)

This year's SIGIR conference has 15 papers from Microsoft Research which clearly shows Microsoft is going to put a lot of effort into IR in the near future and researchers at Microsoft are certainly working hard to make Bing better and better.

Yesterday's keynote speech at SIGIR was presented on refactoring of the search problem in which Gary W Flake of Microsoft Live Labs described and demonstrated Microsoft Pivot.

From what I see it, Pivot seems to be a cross of the aspects of HCI (Human Computer Interaction) and Information Retrieval. Watch this TED talk for a live demo of Pivot:



Pivot's claim is to get rid of the curse of information overload in this information age by making the user search experience more near to a search rather than simple browsing.........this he said is achieved by taking raw data and combining it with metadata for faceted navigation. The idea seems promising but I find it is more so borrowed from Wolfram Alpha who have already experimented with this type of search engine which they call a computational knowledge engine: http://www.wolframalpha.com

Also some hard challenges in this task involve server-side issues and a question: is this style of search a good model for all kinds of searches? That the future will tell as Microsoft has plans to integrate Pivot technology with their recently released search engine Bing.

I will be blogging more on some key papers and talks in SIGIR...........if you are interested in live updates follow on twitter with hastag #sigir2010, #sigir and #sigir10.

Friday, July 2, 2010

Paper Reviews: A Great Learning Experience

Reading a fairly good, published paper is a very different thing from reading an unpublished paper submitted for a conference. Most papers presented in decent conferences are well organized and reasonably written and those are the only papers you probably have read in classes and for your research. For a task of reviewing as part of program committee, you get to read quite different papers and many of them are poorly written in one way or another. This I realized when I had to do my first paper review task for the papers submitted for the reputed CIKM conference of this year as my Professor is part of the program committee for this conference. The paper review task is assigned to all PhD students of our lab for learning how to write a good paper the premise behind it being that if you can review papers well then you can also write good papers, luckily I was the only MS student who got to do such a paper review job because out of the assigned papers by CIKM one was related to my MS thesis topic.

The review process is a very rigorous one with a whole round of discussions between the students and seniors (PhD and PostDoc students) ; we read each and every paper carefully along with identifying the problem statement in each paper, the related works in the dimension and the solution proposed to solve the problem. We then identify strong and weak points in each paper which is of course the tough part and is the determining criteria whether to accept or reject the paper.

This whole activity although time-consuming and cumbersome offers a lot to learn specially for students like me since you are in shoes of a reviewer who reads papers you submit to conferences. You are reminded of the do's and dont's while submitting your own paper and this is the entire point of this activity. Reading the papers makes you learn how to write a good paper by following some essential guidelines and keeping in mind the mistakes you should not do at all.

Most importantly the effort that goes into the review process of these credible and prestigious conferences is what Computer Science community in Pakistan should learn from..........these days there is a whole "blogging" boom with bloggers considering themselves extremely credible about which I also wrote and criticized few days back. The people who are related to Computer Science back in my home i.e. Pakistan and in other developing countries need to learn that credibility comes from published, reviewed and innovative work. At the end of the day Computer Science is a science and has to be treated like that, the technologists might be good in their respective fields but they lack the expertise needed to make a country prosper in the long run...............scientists are needed to create technologies of the future and this is what matters when it comes to real development of a country.

I wrote a quick post to voice out my concerns for the betterment of Computer Science in developing countries and now back to review job as writing the review is another tough part of the job which I have yet to finish.

Friday, May 28, 2010

Python and C++ Integration Tutorial

Today I am going to write for the wonderful Python community out there, beauty of open-source is the huge community and help that they provide to each other through blogs, forums, mailing lists etc. So after my struggle with Python and C++ few days ago I planned to write this post to help those who may be struggling with a similar problem.

Although Python is such a great programming language in which you can do almost anything as I mentioned in my previous post but when it comes to performance and efficiency C/C++ has no competitor. So there are times especially for large-scale systems where performance cannot be sacrificed and you have to resort to C/C++ but oh no!!! Most of your source code is already in Python.........so what now?? Well not to worry. Another great thing about Python is the concept of extending it using C/C++ and that's exactly what I did. But it's easier said than done.

Calling C from Python is relatively easy but calling C++ is the real headache so I am sharing how to do it for any programmer that needs it, I had a tough time due to lack of proper tutorials on the subject and this also motivated me to write one :)

The sample class that we use here is Employee and we will use its data members and methods in Python. You can download the complete code here.

In the Employee.cpp file there is a piece of code that does the actual work, Python can call C functions with Python C API and hence its necessary to provide C extensions by keyword extern, here is the code that does it:

extern "C" {
Employee* Employee_new(){ return new Employee(); }
void Employee_promote(Employee* emp){ emp->promote(); }
void Employee_demote(Employee* emp){ emp->demote(); }
void Employee_hire(Employee* emp){ emp->hire(); }
void Employee_fire(Employee* emp){ emp->fire(); }
void Employee_display(Employee* emp){ emp->display(); }
void Employee_setFirstName(Employee* emp,char* inFirstName){ emp->setFirstName(inFirstName); }
void Employee_setLastName(Employee* emp,char* inLastName){ emp->setLastName(inLastName); }
void Employee_setEmployeeNumber(Employee* emp,int inEmployeeNumber){ emp->setEmployeeNumber(inEmployeeNumber); }
void Employee_setSalary(Employee* emp,int inNewSalary){ emp->setSalary(inNewSalary); }
}


In the code the return type of all methods is void, the constructor must have the form ClassName*...........the tough part is when you have methods that take arguments and I had quite a tough time figuring that out due to lack of tutorials on the subject. The way to do it is like this, within the extern block write the method signature as before but after the object you should have the parameter types in the signature as shown here: void Employee_setFirstName(Employee* emp,char* inFirstName){ emp->setFirstName(inFirstName); }

Now the library generation part, here's how to do it in g++

g++ -c -fPIC Employee.cpp -o Employee.o
g++ -shared -Wl,-soname,libEmployee.so -o libEmployee.so Employee.o

This will generate the dll libEmployee.so (you can give it any name of your choice) and then it can be easily called in Python. Here is an example of Python code calling a C++ class, marvelous:

from ctypes import cdll
lib = cdll.LoadLibrary('./libEmployee.so')

class Employee(object):
def __init__(self):
self.obj = lib.Employee_new()

def EmployeeTest(self):
lib.Employee_setFirstName(self.obj,"Marni")
lib.Employee_setLastName(self.obj,"Kleper")
lib.Employee_setEmployeeNumber(self.obj,71)
lib.Employee_setSalary(self.obj,50000)
lib.Employee_promote(self.obj)
lib.Employee_promote(self.obj)
lib.Employee_hire(self.obj)
lib.Employee_display(self.obj)

emp=Employee()
emp.EmployeeTest()


So ctypes does the trick and makes it quite simple........there are also other alternatives such as Boost and Swig which I have not yet explored but will do in near future as I have to work with Python and C++ for my research work for MS thesis. So stay tuned for more.

Friday, May 14, 2010

The Python Experience

Today I will write an introductory post on Python, few days back a student said to me, "Python must be hard" and she is the main reason why I came up with this post.

In one phrase I would say that Python is the best of both worlds because it is capable of delivering the power of traditional compiled languages like C, C++ and the ease of use and simplicity of scripting, interpreted languages like Perl, Tcl. In the world of Python imagination literally becomes the limit for the programmer.

Few people know that Python is used for major tasks by companies like Google, Yahoo!, NASA, Red Hat, Pixar, Disney and Dreamworks. In fact today what we see as Yahoo! mail was the Rocketmail Web-based email service and it was designed in Python. Today many universities are planning to use Python to teach introduction to programming so that students can focus on problem-solving skills instead of being bogged down by the difficulty of the language and some including MIT have even started to do so.

So here goes for the students back in Pakistan the basic features of Python which make it so appealing and powerful:

  • Of course it is a high-level language, its beauty lies in its higher-level data structures that reduce the development time marginally.
  • Python has support for object-oriented programming, in fact it is an object-oriented language all the way down to its core.
  • With python the code is compact and that's also the beauty of it. Python is often compared to batch or Unix shell scripting languages. But with them there is little code-reusability and you are confined to small projects with shell scripts. In fact, even small projects may lead to large and unwieldy scripts. Not so with Python, where you can grow your code from project to project, add other new or existing Python elements, and reuse code at your whim.
  • Python's portability is also what makes it the most widely used programming language today, it can be found on a variety of systems. Python is written in C and due to C's portability Python is available on practically every platform that has an ANSI C compiler.
  • Python is extremely easy to learn and students can grasp it very quickly so I would certainly recommend to many students reading this post and if you need any sort of help feel free to contact me.
  • Python's code is very easy to read so much so that even a reader who has not ever seen a single line of Python will begin to understand and read the code instantaneously.
  • Python code is extremely easy to maintain and if you review a piece of code you wrote some months back you will be able to grasp it in no time.
  • Python is robust to errors. Python provides "safe and sane" exits on errors and when your Python crashes due to errors, the interpreter dumps out a "stack trace" full of useful information such as why your program crashed and where in the code (file name, line number, function call, etc.) the error took place. These errors are known as exceptions. Python even gives you the ability to monitor for errors and take an evasive course of action if such an error does occur during runtime. These exception handlers can take steps such as defusing the problem, redirecting program flow, perform cleanup or maintenance measures, shutting down the application gracefully, or just ignoring it. In any case, the debugging part of the development cycle is reduced considerably.
  • Now comes the one great thing I simply love about Python: numerous external libraries have already been developed for Python, so whatever your application is, someone may have traveled down that road before. All you need to do is "plug-and-play". There are Python modules and packages that can do practically anything from natural language processing in NLTK to scientific computing and everything you can imagine. In Python if you cannot find what you need chances are high that there is a third-party module or package that can do the job.
  • Python has its own memory manager, the thing that makes C and C++ extremely burdensome is that memory management is the responsibility of developer: the programmer has to take care of dirty tasks of memory management no matter what but with Python this headache is gone.
  • Python is classified as an interpreted language. However traditionally purely interpreted languages are almost always slower than compiled languages because execution does not take place in a system's native binary language. But like Java in reality Python is byte-compiled i.e. results in an intermediate form closer to machine language. This improves Python's performance while allowing it to retain the advantages of interpreted languages.
So dear students I would certainly recommend all of you to do give it a go at Python as in the long run it will really benefit you.

Tuesday, May 11, 2010

Is Search Really Dead??

Gone are the days when I had to go through newspaper sites or search engines to find out the result of a late night match that I could not watch the previous night.........today all I do is just login to my Facebook account and there it is it: the latest news right before me. So where are we heading towards?

This post is a little glimpse into the future of information retrieval: yes today I have decided to throw some light on my research area but not from too much of a technical standpoint but from an interesting standpoint which opens a whole new area of research. The area and concept is becoming so important that a special panel was devoted to this discussion in this year's WWW 2010 conference at Raleigh, USA. WWW Conference is the world's most renowned platform for WWW researchers and the theme was "Search is dead." The panel comprised of the following people:

Andrei Broder – Fellow and VP, Search & Computational Advertising, Yahoo! Research.

Marti Hearst – Professor, School of Information, University of California-Berkeley.

Barney Pell – Partner, Search Strategist for Bing, Microsoft.

Andrew Tomkins – Director of Engineering at Google Research.

Prabhakar Raghavan – (Co-organizer and Moderator) Head, Yahoo! Labs .

Elizabeth Churchill – (Co-organizer) Principal Research Scientist, Yahoo! Research.

If I have to sum up the discussion in one line it is "Search as we traditionally know it is already dead!!!!"

Already user expectations from search engines are changing, the traditional task of typing in a query and being offered 10 blue links in response now amounts to a failure.Completeness is the key here: users want search engines to incorporate all the incredible level of richness that is available these days as today there is much more diverse data sources and presentations than links to web pages. So with the user needs getting "weird" and the search business competition getting "fierce" we are at the doorstep of yet another information retrieval (IR) revolution after PageRank.

Well it boils down to an important question: who is a key player in all this??? Any guesses??? Yes social networks like Twitter, Facebook, MySpace and Orkut. Mark Zuckenberg's statement at Facebook's F8 Conference, "We are building a Web where the default is social" throws a lot of light into this entire phenomenon and in particular the new Facebook Platform, crucial parts of which are the Open Graph and Social Plugin. Already as statistics say Facebook has surpassed Google in Internet traffic and this may be the beginning of the new revolution.

So what's your say on it? Is the search really dead?? I will reserve my own thoughts on this for sometime :) and would love to hear from my readers.

In the end a short video to throw some more light into this social media revolution:




Saturday, April 10, 2010

Is blogging really something to brag about?

Today I was shocked at reading a debate on Facebook on whether the HP products are good or not. One argument really threw me in shock and then when I came out of that shock I landed into a worry. To some the comment may not come as a worry but if you look at it from an analytical viewpoint then it surely is a time of emergency for Computer Science in Pakistan, the comment was "I am a blogger and I write about HP!!!" Seemingly it is harmless but the authority with which it was said left me astonished.

My astonishment comes from the level of significance these bloggers are claiming for themselves, fine you blog about stuff and you do have a voice in the news arena and your blog activities may be shaping minds of some people but does that mean that every word you write is credible and warrants acceptance. Your profession is what makes you credible, if you think that just blog posts are what matter then there would not have been any such thing as research value through publications and citations.

It really bothers me to see the state of Computer Science in Pakistan and what's more disappointing is the attitude of the people in the field. They do not even realize where their ignorant and arrogant attitude is going to take the country and on being corrected they strike back with more arrogance and ignorance.

I am afraid that if this trend is carried on it will lead us to a mere consumer society which we already are, here I have my Professors working on mining and analyzing the blogosphere; and back there we just have people content in writing blogs so we are just the data for researches of the future.

Things really have to change and they have to change fast!!!

Friday, February 19, 2010

Writing a Research Paper: Important for Those Thinking about FYP

I have been approached by many students specially in their final year of BS Computer Science for some tips and basic guidelines on writing research papers. So today I decide to blog about it based on what I have gathered from my experience. These techniques are based solely on my experience and anyone having additional suggestions is welcome to contribute.

Writing a paper is not a complex task at all: the most important thing in my opinion is o love what you're doing and be really excited about it. In the words of Meeyoung Cha:

"If you fancy a career as a researcher, you'll spend tens of thousands of hours on work over the next 10 years. The only way you're ever gonna spend 10,000 hours on research is only when you truly deeply love it. If something really engages you and makes you happy, then you will put in the kind of energy and time necessary to become an expert at it." - Click for Source

So find a research topic that really fascinates you and makes you want to invest your hours and hours into it without any regrets.

Divide your activity into phases:
1) Literature Survey
2) Design Phase
3) Implementation Phase
4) Experimental and Analysis Phase
5) Writing the Paper

Literature Survey
After the initial finding of identifying your area comes the real task: narrow down the specific domain from the area in which you want to work i.e. find a problem and conduct a literature survey i.e. see what approaches the research community has proposed to deal with that particular problem for example say there's a problem coming from the web crawling domain. Search for the best conferences in that domain like SIGIR, WWW, CIKM, ICDE etc and browse through the conference proceedings and read the famous papers relevant to your problem. To further explain I take another example from the domain of Computer Networks, suppose you want to take up a problem on Network Virtualization, then go through the famous conferences NSDI, SIGCOMM, SIGMETRICS, OSDI, SOSP etc. and read papers of some of the renowned researchers of the field.
This phase is the most important and crucial for here is from where you can extract all innovation and ideas and it serves as a prelude to the next phase of design.

Design Phase
This phase is the core of your research project/work, from the literature survey that you conducted in the last phase you must have come up with some potential shortcomings of each previously proposed solution. Now is the time to gather all what you gathered and design your own effective solution to deal with the problem at hand. During this phase I would personally advise to take suggestions from researchers working in those domains, the research community works in a collaborative environment and many would love to listen to your approach of handling the problem at hand. You can even write to authors of papers you read during the literature survey phase and they might write back to you with some further suggestions.

Implementation Phase
As obvious from the name in this phase you do the programming part, you can choose any platform suited to your research needs. Best thing about the sciences is that there is no limitation on development platform but try to choose a platform that has a large support from the worldwide community of scientists and engineers so that if you get stuck somewhere finding help is easy and if you choose a platform that lacks such support then bravo, chances are you might end up being one of the pioneer researcher for implementing your idea in that platform.

Experimental and Analysis Phase
Make no mistake about it, this phase is also very crucial and is what adds the real value to your work. It is what distinguishes it from the rest of the works in the field and highlights your research contribution. In this phase you must compare your approach with existing approaches and there must be some important comparison metrics for example bandwidth, throughput, jitter etc. in case of some computer networking research. This phase also serves as a valuable part of your research paper so don't overlook its significance and concentrate your efforts on it with due care.

Writing the Paper
1- Save all related data in one folder including results, pictures or any related text of literature survey, previous reports you have written for the project (if any) etc.
2- Download the specific format of conference/journal from their website. It will be easier for u to directly write as per that format, rather than first writing in a blank word file and then struggling/fighting with word.
3- Write abstract at the end of paper, rather than in the start.
4- The easiset way is first just write all first level heading e.g. (JUST AN EXAMPLE) Introduction, System overview, Algorithm Design, Results, Conclusion, References etc etc . The next step is to write second level headings (where-ever applicable).
5- Then make tables, flow-charts, figures and paste HIGH resolution images in corresponding headindgs.
6- Now your paper needs text stuff. If you are going to copy/paste text from self written reports, do not forget to keep continiuty in your script. Because something might be clear to you (as you have worked on this research/project) whereas for others there might be some "hidden details".)
7- After you finish your paper, write short summary in the form of abstract and assign proper keywords to your paper. The abstract should be such concise that the relavant reader is forced to read your paper!
8- Read your papers yourself a number of times and then ask some of your friend (with good English grammar) to go through it for correction of a/the/an etc.
9- Remember a very important norm, you should inform and give a copy of your paper to all those Professors/Persons whose names you have mentioned in your paper. One should NOT submit any paper in any conference without notice of co-supervisor and Professor.
10- Try to submitt paper some hours (at least) before the deadline time. Because mostly just before the deadline hour, a lot of people upload and thus system/server gets jammed. If however this is the case with you, just send the organizers by email (mentioning problem and time of submission).
There are a lot of related other things which one learns from his own personal experience. Remember as I mentioned no work is worthless !! You just have to present them in a proper and systematic way and find a conference which matches the technical depth of paper. Most of conferences in Pakistan provide a good nursery to write/present your work. So do not under-estimate your work. If anyone has any query, please feel free to ask me on my personal address (arjumand_younus@yahoo.com) or I would recommend to discuss on the group:


Lastly a very important announcement from my side for the final year students, I have many research ideas and am looking for some students for research collaborations, if anyone wants to pick up any idea for his/her Final Year Project then feel free to contact me again on the email address given above or the Yahoo group.

All the very best in this venture of final year project!!!