|
Academic research related to the Open Directory Project. The listed research papers may quote ODP as an example for a large web directory, they may describe studies based on ODP data, tests for which ODP data were used or they may focus on ODP itself.
|
|
|
1. |
Improving Web Search Results Using Affinity Graph
-
Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma. The 28th Annual International ACM SIGIR Conference. The authors propose a ranking scheme named Affinity Ranking (AR). Yahoo, ODP and newsgroup data are used for the experiments.
|
|
|
2. |
Index Construction for Linear Categorisation
-
Vaughan R. Shanks, Hugh E. Williams. RMIT University, Melbourne, Australia. Proceedings of the twelfth international conference on Information and knowledge management. A problem with iterative training techniques for automatic text categorisation such as Support Vector Machines (SVM) is that during the learning phase, they require the entire training collection to be held in main-memory, which is infeasible for large training collections such as DMOZ or large news wire feeds. The authors present techniques which permit automatic categorisation using very large training collections, vocabularies, and numbers of categories. ODP is mentioned as a possible set of training data.
|
|
|
3. |
OCELOT: A System for Summarizing Web Pages
-
Adam L. Berger, Carnegie Mellon University, and Vibhu O. Mittal, Just Research, Pittsburgh, USA. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. Probabilistic models are used to select and order words into a gist. The paper describes a technique for learning these models automatically from a collection of human-summarized web pages, the authors used ODP data for this purpose.
|
|
|
4. |
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization
-
Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, Qiansheng Cheng, Weiguo Fan, Wei-Ying Ma. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Experiments based on 20 Newsgroups (20NG), Reuters Corpus Volume 1 (RCV1) and ODP data show that OCFS is a consistently better feature selection method than Information Gain (IG) and c 2-test (CHI).
|
|
|
5. |
On Labeling Schemes for the Semantic Web
-
Vassilis Christophides, Dimitris Plexousakis, Michel Scholl, Sotirios Tourtounis. Proceedings of the 12th international conference on World Wide Web. Deals with optimization of the navigation through voluminous subsumption hierarchies of topics. Storage and query evaluation performance of two labeling schemes are compared for the 16 ODP hierarchies.
|
|
|
6. |
Summarizing Web Sites Automatically
-
Y. Zhang, N. Zincir-Heywood, E. Milios, Dalhousie University, Canada. Proceedings of the Sixteenth Conference of the Canadian Society for Computational Studies of Intelligence. Machine learning and natural language processing techniques are employed to automatically summarize web pages. The summaries are compared with ODP descriptions and with the results of browsing experiments. [PDF]
|
|
|
7. |
The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases
-
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ODP RDF dump was used as a testbed for a suite of tools for RDF validation, storage and querying.
|
|
|
8. |
The Structure of Broad Topics on the Web
-
Soumen Chakrabarti, Mukul M. Joshi, Kunal Punera, David M. Pennock. IIT Bombay and NEC Research Institute. Proceedings of the 11th international conference on World Wide Web. Many studies on the Web graph concentrate on the graph structure, and do not consider textual properties of the nodes. The authors propose that a topic taxonomy such as Yahoo or ODP provides a useful framework for understanding the structure of content-based clusters and communities, and they present measurements that may prove valuable in the design of community-specific crawlers and link-based ranking systems. The experiments are based on ODP data.
|
|
|
9. |
Topical TrustRank: Using Topicality to Combat Web Spam
-
Baoning Wu, Vinay Goel and Brian D. Davison propose to partition the seed set used in TrustRank by topic and calculate trust scores for each topic separately, making use of the Open Directory Project. Paper presented to the 15th International World Wide Web Conference.
|
|
|
10. |
W3C Semantic Web Activity
-
Marja-Riitta Koivunen and Eric Miller. Proceedings of the Semantic Web Kick-off Seminar in Finland. The authors explain the main Semantic Web principles, key technological layers, they present W3C activity and sample applications. The ODP RDF dump is used to show how easy it is to merge RDF based information.
|
|
|
11. |
Web-Page Summarization Using Clickthrough Data
-
Jian-Tao Sun, Dou Shen, HuaJun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen. The 28th Annual International ACM SIGIR Conference. The authors propose two adapted summarization methods that take advantage of the relationships discovered from clickthrough data. For those pages not covered by clickthrough data, they put forward a thematic lexicon approach to generate implicit knowledge. The methods are evaluated on a relatively small dataset consisting of manually annotated pages as well as a large dataset crawled from ODP.
|
|
|