An Introduction to Language Processing with Perl and Prolog.pdf

Cognitive Technologies

Managing Editors: D. M. Gabbay J. Siekmann

Editorial Board: A. Bundy J. G. Carbonell

M. Pinkal H. Uszkoreit M. Veloso W. Wahlster

M. J. Wooldridge

Advisory Board:

Luigia Carlucci Aiello

Franz Baader

Wolfgang Bibel

Leonard Bolc

Craig Boutilier

Ron Brachman

Bruce G. Buchanan

Anthony Cohn

Artur d’Avila Garcez

Luis Fariñas del Cerro

Koichi Furukawa

Georg Gottlob

Patrick J. Hayes

James A. Hendler

Anthony Jameson

Nick Jennings

Aravind K. Joshi

Hans Kamp

Martin Kay

Hiroaki Kitano

Robert Kowalski

Sarit Kraus

Maurizio Lenzerini

Hector Levesque

John Lloyd

Alan Mackworth

Mark Maybury

Tom Mitchell

Johanna D. Moore

Stephen H. Muggleton

Bernhard Nebel

Sharon Oviatt

Luis Pereira

Lu Ruqian

Stuart Russell

Erik Sandewall

Luc Steels

Oliviero Stock

Peter Stone

Gerhard Strube

Katia Sycara

Milind Tambe

Hidehiko Tanaka

Sebastian Thrun

Junichi Tsujii

Kur t VanLehn

Andrei Voronkov

Toby Walsh

Bonnie Webber

Pierre M. Nugues

An Introduction to

Language Processing

with Perl and Prolog

An Outline of Theories, Implementation, and Application

with Special Consideration of English, French, andGerman

With 153 Figures and 192 Tables

123

Author :

Pierre M. Nugues

Institutionen för Datavetenskap

Lunds Tekniska Högskola

E-huset

Ole Römers väg 3

223 63 Lund, Sweden

Pierre.Nugues@cs.lth.se

Managing Editors:

Prof. Dov M. Gabbay

Augustus De Morgan Professor of Logic

Department of Computer Science, King’s College London

Strand, London WC2R 2LS, UK

Prof. Dr. Jörg Siekmann

Forschungsbereich Deduktions- und Multiagentensysteme, DFKI

Stuhlsatzenweg 3, Geb. 43, 66123 Saarbrücken, Germany

Library of Congress Control Number: 2005938508

ACM Computing Classiﬁcation (1998): D.1.6, F.3, H.3, H.5.2, I.2.4, I.2.7, I.7, J.5

ISSN 1611-2482

ISBN-10 3-540-25031-X Springer Berlin Heidelberg New York

ISBN-13 978-3-540-25031-9 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material

is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microﬁlm or in any other way, and storage in data banks. Duplication

of this publication or parts thereof is permitted only under the provisions of the German Copyright

Law of September 9, 1965, in its current version, and permission for use must always be obtained from

Springer. Violations are liable for prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

springer.com

Printed in Germany

The use of general descriptive names, registered names, trademarks, etc. in this publication does not

imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

Cover Design: KünkelLopka, Heidelberg

Typesetting: by the Author

Production: LE-T E XJelonek,Schmidt&VöcklerGbR,Leipzig

Printed on acid-free paper 45/3100/YL 5 4 3210

À mes parents,

À Madeleine

Preface

In the past 15 years, natural language processing and computational linguistics have

considerably matured. The move has mainly been driven by the massive increase

of textual and spoken data and the need to process them automatically. This dra-

matic growth of available data spurred the design of new concepts and methods, or

their improvement, so that they could scale up from a few laboratory prototypes to

proven applications used by millions of people. Concurrently, speed and capacity of

machines became an order of magnitude larger enabling us to process gigabytes of

data and billions of words in a reasonable time, to train, test, retrain, and retest algo-

rithms like never before. Although systems entirely dedicated to language processing

remain scarce, there are now scores of applications that, to some extent, embed lan-

guage processing techniques.

The industry trend, as well as the user’s wishes, towards information systems

able to process textual data has made language processing a new requirement for

many computer science students. This has shifted the focus of textbooks from readers

being mostly researchers or graduate students to a larger public, from readings by

specialists to pragmatism and applied programming. Natural language processing

techniques are not completely stable, however. They consist of a mix that ranges

from well mastered and routine to rapidly changing. This makes the existence of a

new book an opportunity as well as a challenge.

This book tries to take on this challenge and ﬁnd the right balance. It adopts a

hands-on approach. It is a basic observation that many students have difﬁculties to go

from an algorithm exposed using pseudo-code to a runnable program. I did my best

to bridge the gap and provide the students with programs and ready-made solutions.

The book contains real code the reader can study, run, modify, and run again. I chose

to write examples in two languages to make the algorithms easy to understand and

encode: Perl and Prolog.

One of the major driving forces behind the recent improvements in natural lan-

guage processing is the increase of text resources and annotated data. The huge

amount of texts made available by Internet and the never-ending digitization led

many of the practitioners to evolve from theory-oriented, armchair linguists to fran-

tic empiricists. This books attempts as well as it can to pay attention to this trend and

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: