10 May
한때 지금과 같은 형태의 컨퍼런스가 사라질 것이라고 생각한 적이 있었습니다. 모든 분야의 최신 정보가 실시간으로 전파되며 그 수단 역시 텍스트 위주에서 비디오 등으로 다양화되는 요즈음, 왜 굳이 몇달 전부터 논문을 준비하여 심사를 받고, 막대한 비용을 들여가며 만나야 하는지 이해할 수 없었습니다. 사람들이 조금만 더 합리적으로 생각한다면 번거로운 컨퍼런스 대신 발표 동영상을 올리고 실시간으로 comment를 받는 식으로 바뀌리라는 상상을 하기도 했습니다.
첫 컨퍼런스인 ECIR에 참석하고 나서야 이것이 얼마나 단견이었는지를 느꼈습니다. 컨퍼런스는 정보와 지식만큼이나 관계 형성과 감정의 교류를 위한 장이라는 깨달음을 얻었기 때문입니다. 공부는 인터넷으로도 할 수도 있지만, 지구 저편에서 온 연구자와 공감할 때 느껴지는 쾌감을 얻을 수는 없으니까요.
첫번째 가르침은 의외의 곳에서 찾아왔습니다. 도착한 첫날, 여정을 풀고 숙소 주변을 둘러보고 있는데, 누군가 제 이름을 불러서 돌아보니 지도교수님께서 야외 테이블에서 맥주를 한잔 하고 계셨습니다. 반가운 마음에 자리에 앉아 한참 대화를 나누다가 “첫 컨퍼런스를 성공적으로 보내는 방법은 무엇입니까?”라는 질문을 드렸습니다. 교수님께서는 잠시 생각하시더니 뜻밖에도 “최대한 사람들을 많이 만날 것, 그리고 마음이 맞는 사람들과 깊이있는 대화를 나눌 것”이라고 말씀하셨습니다. 그때는 ’지식’보다 ’사람’을 강조하시는 점을 의아하게 생각했지만 며칠이 지나서 그 뜻을 알 수 있었습니다.
둘째날 튜토리얼과 웍샵이 있었고 저는 Interactive IR Workshop과 Information Extraction Tutorial에 참석했습니다. 논문으로만 접하던 학자들의 발표도 발표였지만, 연구자들이 모인 자리인지라 즉석에서 생동감있는 토론이 벌어지는 것이 인상적이었습니다. 저녁에는 참석자들과 근처의 Bar에 가서 한밤중까지 많은 이야기를 나누었습니다. University of Glasgow를 비롯한 유럽 각지에서 온 사람들이라, 미국의 대학원 생활 그리고 한국에 대해 많은 관심을 나타냈습니다. 첫 만남이었음에도 연구자라는 공통점 떄문인지 금방 친해질 수 있었습니다.
그 다음날 지도교수님의 기조연설을 시작으로 본행사가 시작되었습니다. Query Evolution이라는 주제의 연설은 지금까지 키워드 처리에 중점을 맞추어 개발된 검색엔진이 좀더 자연어에 가까운 길고 복잡한 질의어를 다룰 수 있어야 한다는 요지의 발표였습니다. 이어진 논문 발표를 들으면서 많은 발표자들이 30분이라는 제한된 시간에 지나치게 많은 내용을 전달하려고 한다는 인상을 받았습니다. 저녁에 숙소에 돌아와서는 슬라이드의 불필요한 문장과 내용은 모두 빼고 최종 연습을 했습니다. Glasgow에서 온 친구들이 연습 발표를 지켜봐주어 많은 도움이 되었습니다.
드디어 발표날, 수차례의 연습 덕인지 발표가 가까워질수록 마음이 편해지면서 자신이 생겨 실제 발표는 즐겁고 편안하게 할 수 있었습니다. 발표 직후에 ‘단순하면서도 활용도가 높은 연구였다’, ‘알아듣기 쉬웠다’는 comment를 받으며 보람을 느꼈습니다. 특히, 검색학회의 원로이자 Microsoft Research Cambridge의 책임자인 Stephen Robertson경 등을 비롯하여 존경하는 연구자들이 많이 참석한 자리라 더욱 기뻤습니다.
발표를 마치고는 좀더 편한 마음으로 남은 일정에 임할 수 있었습니다. 그제서야 봄을 맞은 남프랑스의 아름다운 정경이 눈에 들어왔습니다. 컨퍼런스를 마치고는 이틀 정도 파리에 머물며, 루브르 박물관, 퐁피두 센터, 피카소 박물관 그리고 몽마르트 언덕 등을 주마간산으로나마 돌아보았습니다. 궁금하신 분들을 위해 웹 겔러리에 사진을 올렸습니다.
긴장도 많이 했고 시행착오도 있었지만, ECIR 참석은 평생 잊을 수 없는 기억이 될 것 같습니다. Amherst에서 고민하면서 작업할 때에는 보잘것없이 여겨지던 작업이지만, 이렇게 세계 각국의 학자들과 교감을 할 수 있는 발판이 되었다는 사실이 놀랍기만 합니다. “자신이 선택한 길을 최선을 다해 걸어갈 때, 그 길의 끝에서 세상을 만날 수 있다”는 구본형 선생님의 책 한구절이 떠오릅니다.
Tags :
컨퍼런스,
IR
Print
Comments(1)
Trackback
25 Apr
One of things you realize as a CS grad student is that writing a good code is not considered as important as you thought it would be. Since your research codes only need to get a data for your experiment, this downplay of coding seems to make a sense.
When I ask my fellow grad students, their responses are: “Why do you care about the code that should be used once and for all?.” For this reason, mose people end up writing ad-hoc scripts which seem to be seldom reused (even re-read, since they typically use Perl — a write-only language)
I took a different view from that. 3 years of experience as a software developer let me know that it is not good for your well-being(!) to see ugly code every day. And experiments we do as an IR grad studnet should not be that entirely different all the time.
After a year and half passed since I got here and I seem to know a bit better about IR experiments than before. For every new experiment I ran, I tried to extend and generalize existing code rather than starting from scratch, which left a considerable amount of Ruby code. (Yes, my choice of language is Ruby. After all, it is a language purportedly designed for the pleasure of programming. How appealing is that?)
The resulting software – dubbed ‘iLab’ – consists of the framework that is common to every experiment and the part that supports individual experiment. Since the framework part provides the object abstraction of every usual stuff IR experimentors deal with — document, query, retrieval engine, you can build your experiment by just calling it.
The good point of using iLab as opposed to building ad-hoc code for each experiment seems evident. When you want to work on a new collection or task, you can do it with simple set of API call. Compare this with having to copy-and-paste existing code, which will result in a pile of buggy codes which you may not want to look at again.
iLab also enables you to do things which would not be possible at all in ad-hoc scripting. For instance, you can build a more sophisticated experiment by combining smaller, simpler experiments. If you need to run a cross-validation of some machine learning algorithm, this can be a useful feature.
If you’re interested in, here’s the slide that briefly introduces iLab. Also, check out the follow experimental result which is auto-generated by iLab. (For a tip, if you click each heading, you can sort reports by that criteria)
Since iLab is not still distribution-ready, please let me know if you want to try it so that I can be motivated to make further effort.
Tags :
IR,
Experiment
Print
Comments
Trackback
21 Aug
웹의 발전은 정보검색(IR)이라는 분야를 도서관학의 일개 전공분야에서 최신 기술과 막대한 인력, 자원이 투입되는 거대 산업으로 바꾸어 놓았습니다. 지금도 대부분의 사람에게 웹 검색은 일상사가 되었지만, 앞으로도 위키/블로그와 같은 데이터 생산 및 공유 기술의 개발, 그리고 그에 따른 데이터의 증가에 따라 웹 검색의 수요는 계속 늘어날 것입니다.
규모도 규모지만, 웹은 IR 연구자들에게 항상 새로운 과제를 제시합니다. 홈페이지 검색, 뉴스 검색, Q&A 검색, 블로그 검색 등은 웹이 없었다면 생기지도 않았을 IR의 세부 연구주제입니다. 새로운 형태의 웹 서비스는 고유의 검색 문제를 안고 태어난다고 해도 과언이 아닙니다. 문제만 준다면 조금 얄미울텐데, 다행히 대부분은 해결의 실마리도 따라옵니다. 웹 문서의 링크 구조를 활용하는 PageRank 알고리즘, 검색에 활용되기 시작한 태그 데이터 등이 그 사례입니다.
하지만 웹의 발전이 IR연구자에게 주는 가장 귀한 선물은 역시 검색 로그(query log)가 아닐까 합니다. 사용자가 서비스에 접속하여 어떤 질의어를 입력하는지, 그리고 어떤 문서를 클릭하는지를 그대로 기록한 검색 로그에는 연구자들이 궁금해하는 검색의 비밀이 고스란히 담겨있기 때문입니다. 더 좋은 것은 이런 데이터를 거의 무제한으로 비용 없이 얻을 수 있다는 점입니다.
문제는 검색 업체가 아닌 이상은 이 데이터에 접근할 수 없다는 점인데, 최근에 MSR에서 검색 로그 데이터를 선별된 연구자에게 제공하는 워크샵을 제안했습니다. MSN의 검색 로그 1500만건이 클릭 데이터와 같이 제공된다고 하니 흔치않은 기회인 것 같습니다.
그렇다면 검색 로그로 무엇을 할 수 있을까요? 가장 단순하게는 질의어 통계를 낼 수 있을 것입니다. 네이버나 다음에서 보여주는 ‘인기 검색어’등이 그 예입니다. 또한, 사용자 활동이 세션(접속→종료)별로 기록된다는 점을 감안하면 사용자가 먼저 입력한 질의어를 가지고 다음 질의어를 예측하는 모델을 만들 수 있을 것입니다. 질의어 맞춤법 교정(query correction)이나 관련 검색어 제안 등이 이러한 사례입니다. 이를 좀더 확장하면 사용자의 검색 실력을 감지하여 그에 따라 적절한 처리를 해줄 수도 있습니다.
올해 SIGIR에서 화제가 되었듯이, 질의어를 분류하고 군집화하여 적절한 처리를 하는 데에도 검색 로그는 필수적입니다. 트렉 등에서 제공하는 수십 수백건의 질의어를 갖고 의미있는 질의어 모델을 만드는 것은 상식적으로 불가능하겠죠. 또한 연구용 컬렉션(문서-질의어 모음)은 실제 검색 서비스에서 나온 데이터가 아니라는 한계도 있습니다.
각 검색에에 대해 클릭된 문서가 질의어에 관련하여 옳은(relevant) 문서라고 간주하면, 대용량 검색 로그를 가지고 검색 모델(retrieval model) 자체를 학습하는 것도 가능합니다. 실제 검색 로그는 검색 서비스를 개발하고 개선하는 데 기반이 되며, 검색 모델 학습(Learning to rank)이 웹 검색의 폭발적 확산과 함께 본격적으로 연구되기 시작한 것도 우연은 아닐 겁니다. 하지만 사용자가 목록에 있는 모든 문서를 검토하고 클릭을 하는 것은 아니며, 원래 검색 의도와 다른 클릭도 있을 수 있기 때문에, 이런 방식으로 검색 로그를 활용하기 위해서는 다양한 노이즈 제거 기법이 적용되어야 합니다.
이러한 검색 로그 분석에는 대용량 데이터가 사용되기 떄문에 MapReduce 등의 대용량 병렬 처리 환경이 필요합니다. 저는 최근에 연구실에 있는 검색 로그를 학교의 Hadoop Cluster에서 분석하고 있는데, Pig라는 환경이 MapReduce를 데이터베이스(RDBMS)처럼 편리하게 사용하도록 도와줍니다. 좀더 익숙해지면 이곳을 통해 소개하도록 하겠습니다.
Tags :
IR,
검색로그
Print
Comments(2)
Trackback
12 Jul
PubDate(2004), PubPlace(SIGIR) Author(Nallapati)
keyword(Discriminative Model,SVM,Maximum Entropy)
Summary
Applying discriminative model to IR task resulted in improved result for Homepage finding task that typically requires combining many arbitrary features.
Content
Background
- People most have approached IR using generative classifier
- LMIR : Assuming that each document forms its own class, it’s similar to classifying given query to best-matching document class using Naive-bayes classifier
Contribution
- Problem for IR as classification
- Out-of-vocabulary problem : query-word may not appear in document
- Overcome by aggregate feature
- Unbalance of relevance/non-relevance class
- Overcome by undersampling nr class
Experiment
Future Work
Comment
Reference
Tags :
Paper,
IR
Print
Comments
Trackback
10 Jul
PubDate(2008), PubPlace(TIS) Author(Cohen,Domshlak,Zwerdling)
keyword(Desktop search,Learning to rank,)
Summary
Desktop search experiment with known-item search task.
Content
Background
- Stuff I’ve seen(SIS) : users sort result by last-update date more frequently than by IR ranking
- The older the data, the less often it is used
- Type of information stored in desktop
- Ephemeral : reminder, short-lived
- Working : related to ongoing work
- Archive : long-term resource
Contribution
- Novel Feature
- Level : the distance of a file from uppermost directory
- DirRank : The probability to open a file in specific directory is proportional to the number of files previous opened from this (and its sub) directory. (normalized by files in directory)
- Selectivity ; combine values of content-similarity feature by the inverse of of the file number with non-zero feature value.
- e.g. If user query was match with the filename field of 100 files, the score of name field is divided by 100.
Experiment
- Queries are grouped by the no. of results returned
- As more results are returned by each query, date-related features became more useful. (selectivity)
- Combination of result by selectivity was proven to be as effective as learning-based methods
Future Work
Comment
Reference
Tags :
Paper,
IR
Print
Comments
Trackback
10 Jul
PubDate(1999), PubPlace() Author(Voorhees)
keyword(NLP,IR)
Summary
NLP doesn’t help ad-hoc retrieval.
Content
Background
Contribution
- Term-normalization may help
- Term-mismatch is major performance degrading factor.
- No evidence that ‘meaning’ representation can boost perf.
- Queries are troublesome for NLP-IR.
- With sufficient context, BoW model implicitly disambiguates query-term.
- e.g. Using document similarity measure to build word sense classifier (Word occurrences in similar documents can be considered to have the same meaning)
Experiment
- Will ‘concept indexing’ based on WordNet synset improve effectiveness?
- Using extended VSM : linear interpolation of similarity btw. the query and several vectors of document representation
- Disambiguation : nouns are disambiguated into a synset id
- 3 subvectors : 1) words not disambiguated 2) synset ids of disambiguated nouns 3) stems of disambiguated nouns
- Concept indexing hurt performance in most cases
- More of term mismatches due to errors in word sense disambiguation
- Some queries were helped
- ‘101’ run where non-stemmed verbs and adjectives are not counted as match showed worse perf. than baseline — impact of term mismatch again
Future Work
Comment
Reference
Tags :
Paper,
IR
Print
Comments
Trackback
12 Jul
PubDate(2002), PubPlace(SIGKDD) Author(Joachims)
keyword(SVM,Clickthrough,Pairwise preference,Learning to rank,Metasearch)
Content
Background
- LTR based on expert-judged relevance
Contribution
- Generation pairwise pref. using clickthrough
- Training this binary ordering using SVM
- By minimizing rank correlation to optimal ranking via Kendall’s

- Meta-search (Strive) engine to compare the learned ranking function with existing methods
- To perform unbiased comparison of different rankings with clickthrough data, rank combination method that equally presents the links from each system.
Experiment
- As more data was used for training, the algorithm showed lower rate of error.
Future Work
Comment
The author used the search engine himself to verify his result. I may need to build metasearch engine myself, which can be useful for a variety of tasks.
Reference
Tags :
Paper,
IR
Print
Comments
Trackback
25 Jan
One of common questions IR researchers ask is: ‘Is this new retrieval method better than the old one?’ Mostly, we turn to Statistics for the answer, which is the method called ‘significance test’
Given two sets of performance measurements(typically MAP or Precision@K) for both systems, we run significance test and get the probability that the both result set is from the same distribution. If this probability is smaller than some predetermined value (e.g. 0.05), we know that there is no significant difference between the performance of these systems.
In Statistical terms, the probability here is called ‘P-value’ and the assumption that both set is from the same distribution is called ‘Null Hypothesis’, which we may hope to deny. (especially if we devised this new method)
As you may guess, there are many methods for significance test used for IR, differentiated by the assumptions they make— underlying distributions, and so on. According to recent paper in which these methods are compared, it is found that randomization test, bootstrapping test and t-test shows the same result, while Wilcoxon and sign test, simplified forms of randomization test, shows different result from others and therefore discouraged from the use.
Tags :
Essay,
IR,
Statistics
Print
Comments
Trackback
21 Jan
This is the survey article I found while taking IR class last Fall. While this article seemed interesting from its title, I couldn’t get the good grasp of this one as I had little understanding of the IR field in general.
When I read this one again a few days ago, I could finish this one with greater interest. Not only did it provide me with a well-chosen reading list, but also it gave me a clue on IR research trends seen from the perspective of papers published and well-received.
Since my life as a grad student may circle around the papers, it should be worthwhile to summarize lessons I learned.
What makes a ‘classic’ paper?
My first curiosity was why these handful of papers were chosen among thousands of IR papers that came out to the world so far. What made them so special?
Novelty
Any research paper should be ‘new’ in some ways— that’s what makes it a ‘research’. Yet many of these classic papers are greater and more beneficial in their novelty. Some of them started to ask questions people have never even thought of before, some others provided a whole new perspective to an existing problem, still others applied existing technique and theory to a new venue of problems or brought in the knowledge of other field to solve an IR problem.
While selected papers are top-quality in most other criteria, some were selected despite their obvious limitation in methodology or performance, from which we can see the value of bringing in new ideas and approaches.
Result
Since IR is a field rich in performance metric— although few seem to know which one is the best, a work with improved performance is noteworthy. Based on my 5-month long observation in CIIR, there seem to be many cases in which a method with superior result comes out first— by some chance or mistake(!), followed by theoretical justification.
Of course, given that their performance improvement is consistent and significant, most of these ‘result’ papers are proven to be novel later on.
Methodology
If I say that ‘novelty’ papers are excellent in finding a problem worth-solving, some papers draw attention for how they solved the problems. Even without groundbreaking idea or superior result, these papers are read by many people as they teach valuable lessons — mostly in terms of experiment design and interpretation. These ‘methodology’ papers should be especially valuable for students just entering the field.
Survey
As a topic is established as a field of research and the result accumulates for some time, it becomes increasingly for individual researchers to follow-up the result of past research. That’s where the survey papers are needed, in which most of major discoveries are summarized in a single paper.
Which track should one pursue?
Given these conditions for good papers, researchers may ask themselves what their strategy should be here, since most papers seem to have strength in a particular criteria — although there a good number of papers qualified in every perspective.
Here’s my crude suggestion. (Although I know I don’t know well enough to make this kind of remark.)
- Novelty Papers : If you are confident about your creative potential — you tend to pinpoint things most people may not come up with. To be successful this way, you should read a wide breadth of literature (even in related fields), which may give you a useful combination of ideas no one thought of before.
- Result Papers : If you’re good at tweaking with a variety of retrieval parameters and settings, getting superior result may be easier for you. All you need to do is find the theory that ‘explains’ your result.
- Methodology Papers : If you have considerable experience and rigor to investigate given issue better than most people, you may turn most problem you work on into quality research paper — even without good result(!).
- Survey Papers : This kind of work would probably be left to guru-level research whose research career shows the advancement of the field itself.
Reference
Tags :
Essay,
IR
Print
Comments
Trackback
29 Nov
This might be my first posting as a IR research. I just entered Information Retrieval Lab in UMass, having a busy time getting used to the life in USA while starting my career as a research.
While I have considered blogging as a good pastime activity, I decided that I may even need to do blogging for the purpose of my research. It may help me develop immature research ideas, learn how others think differently and see things from a more relaxed perspective — a research should not be considered a work to be fruitful.
As an novice blogger, I started to read what others wrote about research. The article about right measure to use drew my attention today. In most of AI-like problems, having the right measure is critical since you’d be optimizing for the wrong direction otherwise. Conversely, if only you got the right measure, you can improve your result and find how good it worked.
The author(Hal) says that F-measure(weighted harmonic mean of precision and recall) is desirable for classification problems where the problem space can be divided into answer vs. non-answer, in addition to well-known rarity reasons — accuracy is useless when one(usually non-answer) class takes up majority. This assertion seem to be semantically correct in a sense that precision and recall – components of F-measure – are defined assuming problem space division suggested by the author.
But in another posting by Chris Manning (a NLP textbook author!), the usefulness of F-measure is restricted to the cases where there are no partial-match problems — e.g. using F-measure for NER task might be problematic.
Back in IR field, I start to think about the problem of dominant measure of IR — MAP. It assumes binary relevance judgment, which is quite naive given the complex notion of relevance. As the new metrics such as NDCG are starting to be widely adopted by IR community, the limitation of MAP will become less significant.
Tags :
IR,
Essay
Print
Comments
Trackback