An Introduction to Language Processing with Perl and Prolog.pdf

(3032 KB) Pobierz
459414091 UNPDF
Cognitive Technologies
Managing Editors: D. M. Gabbay J. Siekmann
Editorial Board: A. Bundy J. G. Carbonell
M. Pinkal H. Uszkoreit M. Veloso W. Wahlster
M. J. Wooldridge
Advisory Board:
Luigia Carlucci Aiello
Franz Baader
Wolfgang Bibel
Leonard Bolc
Craig Boutilier
Ron Brachman
Bruce G. Buchanan
Anthony Cohn
Artur d’Avila Garcez
Luis Fariñas del Cerro
Koichi Furukawa
Georg Gottlob
Patrick J. Hayes
James A. Hendler
Anthony Jameson
Nick Jennings
Aravind K. Joshi
Hans Kamp
Martin Kay
Hiroaki Kitano
Robert Kowalski
Sarit Kraus
Maurizio Lenzerini
Hector Levesque
John Lloyd
Alan Mackworth
Mark Maybury
Tom Mitchell
Johanna D. Moore
Stephen H. Muggleton
Bernhard Nebel
Sharon Oviatt
Luis Pereira
Lu Ruqian
Stuart Russell
Erik Sandewall
Luc Steels
Oliviero Stock
Peter Stone
Gerhard Strube
Katia Sycara
Milind Tambe
Hidehiko Tanaka
Sebastian Thrun
Junichi Tsujii
Kur t VanLehn
Andrei Voronkov
Toby Walsh
Bonnie Webber
459414091.001.png
Pierre M. Nugues
An Introduction to
Language Processing
with Perl and Prolog
An Outline of Theories, Implementation, and Application
with Special Consideration of English, French, andGerman
With 153 Figures and 192 Tables
123
Author :
Pierre M. Nugues
Institutionen för Datavetenskap
Lunds Tekniska Högskola
E-huset
Ole Römers väg 3
223 63 Lund, Sweden
Pierre.Nugues@cs.lth.se
Managing Editors:
Prof. Dov M. Gabbay
Augustus De Morgan Professor of Logic
Department of Computer Science, King’s College London
Strand, London WC2R 2LS, UK
Prof. Dr. Jörg Siekmann
Forschungsbereich Deduktions- und Multiagentensysteme, DFKI
Stuhlsatzenweg 3, Geb. 43, 66123 Saarbrücken, Germany
Library of Congress Control Number: 2005938508
ACM Computing Classification (1998): D.1.6, F.3, H.3, H.5.2, I.2.4, I.2.7, I.7, J.5
ISSN 1611-2482
ISBN-10 3-540-25031-X Springer Berlin Heidelberg New York
ISBN-13 978-3-540-25031-9 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication
of this publication or parts thereof is permitted only under the provisions of the German Copyright
Law of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Cover Design: KünkelLopka, Heidelberg
Typesetting: by the Author
Production: LE-T E XJelonek,Schmidt&VöcklerGbR,Leipzig
Printed on acid-free paper 45/3100/YL 5 4 3210
À mes parents,
À Madeleine
Preface
In the past 15 years, natural language processing and computational linguistics have
considerably matured. The move has mainly been driven by the massive increase
of textual and spoken data and the need to process them automatically. This dra-
matic growth of available data spurred the design of new concepts and methods, or
their improvement, so that they could scale up from a few laboratory prototypes to
proven applications used by millions of people. Concurrently, speed and capacity of
machines became an order of magnitude larger enabling us to process gigabytes of
data and billions of words in a reasonable time, to train, test, retrain, and retest algo-
rithms like never before. Although systems entirely dedicated to language processing
remain scarce, there are now scores of applications that, to some extent, embed lan-
guage processing techniques.
The industry trend, as well as the user’s wishes, towards information systems
able to process textual data has made language processing a new requirement for
many computer science students. This has shifted the focus of textbooks from readers
being mostly researchers or graduate students to a larger public, from readings by
specialists to pragmatism and applied programming. Natural language processing
techniques are not completely stable, however. They consist of a mix that ranges
from well mastered and routine to rapidly changing. This makes the existence of a
new book an opportunity as well as a challenge.
This book tries to take on this challenge and find the right balance. It adopts a
hands-on approach. It is a basic observation that many students have difficulties to go
from an algorithm exposed using pseudo-code to a runnable program. I did my best
to bridge the gap and provide the students with programs and ready-made solutions.
The book contains real code the reader can study, run, modify, and run again. I chose
to write examples in two languages to make the algorithms easy to understand and
encode: Perl and Prolog.
One of the major driving forces behind the recent improvements in natural lan-
guage processing is the increase of text resources and annotated data. The huge
amount of texts made available by Internet and the never-ending digitization led
many of the practitioners to evolve from theory-oriented, armchair linguists to fran-
tic empiricists. This books attempts as well as it can to pay attention to this trend and
459414091.002.png
Zgłoś jeśli naruszono regulamin