24 Aug

English Blog has moved to Wordpress

Please visit http://lifidea.wordpress.com/ for my English blog!

Tags : Print Comments Trackback
25 Apr

iLab -- A Platform for IR Experiment

One of things you realize as a CS grad student is that writing a good code is not considered as important as you thought it would be. Since your research codes only need to get a data for your experiment, this downplay of coding seems to make a sense.

When I ask my fellow grad students, their responses are: “Why do you care about the code that should be used once and for all?.” For this reason, mose people end up writing ad-hoc scripts which seem to be seldom reused (even re-read, since they typically use Perl — a write-only language)

I took a different view from that. 3 years of experience as a software developer let me know that it is not good for your well-being(!) to see ugly code every day. And experiments we do as an IR grad studnet should not be that entirely different all the time.

After a year and half passed since I got here and I seem to know a bit better about IR experiments than before. For every new experiment I ran, I tried to extend and generalize existing code rather than starting from scratch, which left a considerable amount of Ruby code. (Yes, my choice of language is Ruby. After all, it is a language purportedly designed for the pleasure of programming. How appealing is that?)

The resulting software – dubbed ‘iLab’ – consists of the framework that is common to every experiment and the part that supports individual experiment. Since the framework part provides the object abstraction of every usual stuff IR experimentors deal with — document, query, retrieval engine, you can build your experiment by just calling it.

The good point of using iLab as opposed to building ad-hoc code for each experiment seems evident. When you want to work on a new collection or task, you can do it with simple set of API call. Compare this with having to copy-and-paste existing code, which will result in a pile of buggy codes which you may not want to look at again.

iLab also enables you to do things which would not be possible at all in ad-hoc scripting. For instance, you can build a more sophisticated experiment by combining smaller, simpler experiments. If you need to run a cross-validation of some machine learning algorithm, this can be a useful feature.

If you’re interested in, here’s the slide that briefly introduces iLab. Also, check out the follow experimental result which is auto-generated by iLab. (For a tip, if you click each heading, you can sort reports by that criteria)

Since iLab is not still distribution-ready, please let me know if you want to try it so that I can be motivated to make further effort.

Tags : IR,Experiment Print Comments Trackback
24 Feb

A Probabilistic Retrieval Model for Semistructured Data

My first paper ‘A Probabilistic Retrieval Model for Semistructured Data’ (co-work with Xiaobing Xue and W. Bruce Croft) is to be presented in ECIR’09 at Toulouse, France.

It started as a course project in Advanced Database . As a natural intersection of Database and IR, semi-structured (XML) documeent retrieval problem drew my attention.
A simple literature review revealed that most of past work focused on setting the right granularity (XML element) for the retrieval. Also, most of those work assumed a structured query (XPath) rather than keyword query.

I wanted to see the problem differently. The first obvious thing was that it’s beyond the capability of average users to formulate XPath query — it’s hard even for me!.
And a thought on the typical user’s querying behavior made me realize that we implicitly map each query-term into some aspect of the item we are looking for.

Let’s assume a user trying to find a movie ‘French Kiss’ with partial information about cast (‘meg ryan’) and genre (‘romance’). He or she may type ‘meg ryan romance’ yet it is clear which aspect of data (movie) user meant by each query-word. And we can infer this mapping between query-term and document field by bayesian estimation (more detail on paper).

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <movie>
       <title>French Kiss</title>
       <year>1995</year>
       <releasedate>USA:5 May 1995</releasedate>
       ...
       <language>English</language>
       ...
       <genre>Comedy</genre>
       <genre>Romance</genre>
       ...
       <country>USA</country>
       <actors>
          ...
          <actress>Ryan, Meg I</actress>
       </actors>

       <team>
          <director>Kasdan, Lawrence</director>
          ...
       </team>
       <plot>
          An American woman Kate(Meg Ryan) goes on trip to France
          in a desperate effort to find her romance back.
          ...
       </plot>
    </movie>

Given this observation and taking into account that each aspect of information is encoded in different XML element, it is natural that raking algorithm for this kind of document can benefit from this mapping between query-word and document. The solution is to put a higher weight for the element which seems to be what user intended. In above example, ‘cast’ element needs to be weighted higher for ‘meg ryan’ and the same can be said about ‘genre’ element and ‘romance’.

This simple idea later turned out to improve retrieval performance significantly. The performance gain was more noticable for collection with clear semantics (e.g. movie descriptions) since it was easier for a system to map each query-word into correct document field.

I’m currently working on applying this retrieval model for the desktop search problem, XML data were replaced with documents with metadata fields.

Tags : Paper Print Comments Trackback
25 Jan

Significance Test is Significant

One of common questions IR researchers ask is: ‘Is this new retrieval method better than the old one?’ Mostly, we turn to Statistics for the answer, which is the method called ‘significance test’

Given two sets of performance measurements(typically MAP or Precision@K) for both systems, we run significance test and get the probability that the both result set is from the same distribution. If this probability is smaller than some predetermined value (e.g. 0.05), we know that there is no significant difference between the performance of these systems.

In Statistical terms, the probability here is called ‘P-value’ and the assumption that both set is from the same distribution is called ‘Null Hypothesis’, which we may hope to deny. (especially if we devised this new method)

As you may guess, there are many methods for significance test used for IR, differentiated by the assumptions they make— underlying distributions, and so on. According to recent paper in which these methods are compared, it is found that randomization test, bootstrapping test and t-test shows the same result, while Wilcoxon and sign test, simplified forms of randomization test, shows different result from others and therefore discouraged from the use.

Tags : Essay,IR,Statistics Print Comments Trackback
21 Jan

Recommended Reading for IR Research Student

This is the survey article I found while taking IR class last Fall. While this article seemed interesting from its title, I couldn’t get the good grasp of this one as I had little understanding of the IR field in general.

When I read this one again a few days ago, I could finish this one with greater interest. Not only did it provide me with a well-chosen reading list, but also it gave me a clue on IR research trends seen from the perspective of papers published and well-received.

Since my life as a grad student may circle around the papers, it should be worthwhile to summarize lessons I learned.

What makes a ‘classic’ paper?

My first curiosity was why these handful of papers were chosen among thousands of IR papers that came out to the world so far. What made them so special?

Novelty

Any research paper should be ‘new’ in some ways— that’s what makes it a ‘research’. Yet many of these classic papers are greater and more beneficial in their novelty. Some of them started to ask questions people have never even thought of before, some others provided a whole new perspective to an existing problem, still others applied existing technique and theory to a new venue of problems or brought in the knowledge of other field to solve an IR problem.

While selected papers are top-quality in most other criteria, some were selected despite their obvious limitation in methodology or performance, from which we can see the value of bringing in new ideas and approaches.

Result

Since IR is a field rich in performance metric— although few seem to know which one is the best, a work with improved performance is noteworthy. Based on my 5-month long observation in CIIR, there seem to be many cases in which a method with superior result comes out first— by some chance or mistake(!), followed by theoretical justification.

Of course, given that their performance improvement is consistent and significant, most of these ‘result’ papers are proven to be novel later on.

Methodology

If I say that ‘novelty’ papers are excellent in finding a problem worth-solving, some papers draw attention for how they solved the problems. Even without groundbreaking idea or superior result, these papers are read by many people as they teach valuable lessons — mostly in terms of experiment design and interpretation. These ‘methodology’ papers should be especially valuable for students just entering the field.

Survey

As a topic is established as a field of research and the result accumulates for some time, it becomes increasingly for individual researchers to follow-up the result of past research. That’s where the survey papers are needed, in which most of major discoveries are summarized in a single paper.

Which track should one pursue?

Given these conditions for good papers, researchers may ask themselves what their strategy should be here, since most papers seem to have strength in a particular criteria — although there a good number of papers qualified in every perspective.

Here’s my crude suggestion. (Although I know I don’t know well enough to make this kind of remark.)

  • Novelty Papers : If you are confident about your creative potential — you tend to pinpoint things most people may not come up with. To be successful this way, you should read a wide breadth of literature (even in related fields), which may give you a useful combination of ideas no one thought of before.
  • Result Papers : If you’re good at tweaking with a variety of retrieval parameters and settings, getting superior result may be easier for you. All you need to do is find the theory that ‘explains’ your result.
  • Methodology Papers : If you have considerable experience and rigor to investigate given issue better than most people, you may turn most problem you work on into quality research paper — even without good result(!).
  • Survey Papers : This kind of work would probably be left to guru-level research whose research career shows the advancement of the field itself.

Reference

Tags : Essay,IR Print Comments Trackback
29 Nov

Having a Right Measure for IR

This might be my first posting as a IR research. I just entered Information Retrieval Lab in UMass, having a busy time getting used to the life in USA while starting my career as a research.

While I have considered blogging as a good pastime activity, I decided that I may even need to do blogging for the purpose of my research. It may help me develop immature research ideas, learn how others think differently and see things from a more relaxed perspective — a research should not be considered a work to be fruitful.

As an novice blogger, I started to read what others wrote about research. The article about right measure to use drew my attention today. In most of AI-like problems, having the right measure is critical since you’d be optimizing for the wrong direction otherwise. Conversely, if only you got the right measure, you can improve your result and find how good it worked.

The author(Hal) says that F-measure(weighted harmonic mean of precision and recall) is desirable for classification problems where the problem space can be divided into answer vs. non-answer, in addition to well-known rarity reasons — accuracy is useless when one(usually non-answer) class takes up majority. This assertion seem to be semantically correct in a sense that precision and recall – components of F-measure – are defined assuming problem space division suggested by the author.

But in another posting by Chris Manning (a NLP textbook author!), the usefulness of F-measure is restricted to the cases where there are no partial-match problems — e.g. using F-measure for NER task might be problematic.

Back in IR field, I start to think about the problem of dominant measure of IR — MAP. It assumes binary relevance judgment, which is quite naive given the complex notion of relevance. As the new metrics such as NDCG are starting to be widely adopted by IR community, the limitation of MAP will become less significant.

Tags : IR,Essay Print Comments Trackback
7 Oct

GuestBook

Thank you for visiting LiFiDeA.

Please click ‘comments’ to view guestbook and leave your messages.

Tags : Print Comments(1) Trackback