Поскольку две основные книги лаборатории, выпущенные свыше пятнадцати лет назад,
стали библиографической редкостью, мы предлагаем Вашему вниманию их электронный вариант.
Для чтения необходимо установить программу DjVureader, которую можно скачать, например, отсюда .
Сами книги:
Лингвистическое обеспечение системы ЭТАП-2 (9,2 MB)
Лингвистический процессор для сложных информационных систем (5,5 MB)
Статьи:
Igor M. Boguslavsky, Leonid L. Iomdin, et al. Interactive Resolution of Intrinsic and Translational Ambiguity in a Machine Translation System . // In: CICLing 2005. Lecture notes in computer science. A.Gelbukh (ed.) // Springer-Verlag Berlin – Heidelberg, 2005, сс. 383-394
Аннотация The paper presents the module of interactive word sense disambiguation and syntactic ambiguity resolution used within a sophisticated machine translation system, ETAP-3. The method applied consists in asking the user to identify a word sense, or a syntactic interpretation, whenever the system lacks re-liable data to make the choice automatically. For this purpose, entries of the working dictionaries of the system are supplemented with clear diagnostic com-ments and illustrations that enable the user to choose the most appropriate option and in this way channel the course of system operation.
Igor Boguslavsky, Leonid Iomdin, Victor Sizov Multilinguality in ETAP-3. Reuse of Linguistic Resources . // In: [Proceedings of the Workshop “Multilingual Linguistic Resources". 20th International Conference on Computational Linguistics // Geneva, 2004, сс. 7–14
Аннотация The paper presents the work done at the Institute for Information Transmission Problems (Russian Academy of Sciences, Moscow) on the multifunctional linguistic processor ETAP-3. Its two multilingual options are discussed - machine translation in a variety of language pairs and translation to and from UNL, a meaning representation language. For each working language, ETAP has one integral dictionary, which is used in all applications both for the analysis and synthesis (generation) of the given language. In difficult cases, interactive dialogue with the user is used for disambiguation. Emphasis is laid on multiple use of lexical resources in the multilingual environment.
Л.Л. Иомдин Уроки машинного перевода для детей и взрослых . // Лингвистика для всех. Зимняя лингвистическая школа -2004 // Москва, НИИРО, 2004, сс. 56–68
Аннотация Что такое машинный, он же автоматический, он же компьютерный, перевод? Сейчас, когда для перевода текстов с одного языка на другой компьютер используется самыми разными способами – от двуязычных и многоязычных электронных словарей до систем типа translation memory («память», или «архив» переводов), этот вопрос оказывается не таким уж простым. Мы будем понимать машинный перевод как процесс, при котором компьютер по заданному тексту на одном языке производит новый текст на другом языке, которого раньше в этом компьютере не было: понятно, что ни словари, ни архивы переводов таким свойством не обладают. Когда можно говорить о том, что текст A на одном естественном языке является переводом текста Б на другом языке? Разумеется, тогда, когда оба текста – А и Б – имеют одинаковый смысл. Задача любого переводчика как раз и состоит в том, чтобы передать смысл текста (будь то письменного или устного) на одном языке средствами другого языка. В этом же состоит и задача машинного перевода.
Jurij Apresian, Igor Boguslavsky, Leonid Iomdin, Alexander Lazursky, Vladimir Sannikov, Victor Sizov, Leonid Tsinman ETAP-3 Linguistic Processor: a Full-Fledged NLP Implementation of the MTT . // In: MTT 2003. First International Conference on Meaning-Text Theory // Paris, Ecole Normale Superieure, 2003, сс. 279–288
Аннотация A multifunctional NLP environment, ETAP-3 linguistic processor, is presented. The environment, largely based on the Meaning<-> Text Theory, offers several NLP applications, including a machine translation system, a module of synonymous paraphrasing of sentences, a tagger for syntactic annotation of text corpora, a Universal Networking Language interface, a computer-assisted language learning tool, a natural language interface to SQL type databases, and a syntactic error correction module. While all applications are briefly discussed, emphasis is laid on machine translation, as it is by far the most advanced application of all.
Igor Boguslavsky, Leonid Iomdin, Victor Sizov Interactive enconversion by means of the ETAP-3 system . // In: Proceedings of the International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies // Alexandria, 2003
Аннотация A module for enconversion of NL texts into Universal networking Language (UNL) graphs is considered. This module is designed for the system of multi-lingual communication in the Internet that is being developed by research centers of about 15 countries under the aegis of UN. The enconversion of NL texts into UNL is carried out by means of a multi-functional linguistic processor ETAP-3, developed in the Computational linguistics laboratory of the Institute for Information Transmission Problems of the Russian Academy of Sciences. One of the major problems in the automatic text analysis is high degree of ambiguity of linguistic units. The resolution of this ambiguity (morphological, syntactic, lexical, translational) is partly ensured by the linguistic knowledge base of ETAP-3, but complete algorithmic solution of this problem is unfeasible. We describe an interactive system that helps resolve difficult cases of linguistic ambiguity by means of a dialogue with the human.
Jurij Apresian, Igor Boguslavsky, Leonid Iomdin, Leonid Tsinman Lexical Functions as a Tool of ETAP-3 . // In: MTT 2003. First International Conference on Meaning-Text Theory // Paris, Ecole Normale Superieure, 2003
Аннотация The paper describes the use of lexical functions, an instrument proposed in Igor Mel'cuk's "Meaning<=> Text Theory" (MTT), in advanced NLP applications as exemplified in the ETAP-3 linguistic processor, including parsers, high quality machine translation (MT), a system of paraphrasing and computer-aided learning of lexica. In parsing, collocate LFs are used to resolve or reduce syntactic and lexical ambiguity. The MT system resorts to LFs to provide idiomatic target language equivalents for source sentences in which both the argument and the value of the same LF are present. The system of paraphrasing, which automatically produces one or several synonymous transforms for a given sentence or phrase, can be used in a number of advanced NLP applications ranging from MT to authoring and text planning. The computer-aided system of learning lexica is also based on the concept of LFs as a tool of formal description of that part of vocabulary which is simultaneously systematic and idiomatic and is therefore most difficult for language acquisition.
Leonid Iomdin Natural Language Processing as a Source of Linguistic Knowledge . // In: Proceedings of the International Conference on Machine Learning; Models, Technologies and Applications // Las Vegas, 2003, сс. 68–74
Аннотация The paper discusses a number of specific problems of natural text parsing that emerge during the operation of a highly developed rule-based machine translation system, ETAP-3. Emphasis is laid on two classes of problems: 1) adequacy of linguistic description of the working languages of the MT system and 2) means of resolving lexical and syntactic ambiguity of the source text. It is claimed that no parser, however sophisticated or advanced, can be made entirely free of lacunae and gaps. The reason is that many of the linguistic facts, including those critical for parser operation, have never come into view of researchers simply because they have not had at their disposal mass material of unexpected or incorrect parsing. It is exactly such material that is amply provided by a highly developed NLP system. If handled properly, this feedback helps the researcher to find the gaps of scientific descriptions and eliminate them. Consequently, linguistic experimentation with NLP systems becomes a rightful and very promising scientific method. In a way, linguistic applications start to stimulate theoretical research, thus inverting the situation that has existed ever since NLP came to life.
Leonid Iomdin Purpose and Idea: a Lesson Drawn from Machine Translation . // In: MTT 2003. First International Conference on Meaning-Text Theory // Paris, Ecole Normale Superieure, 2003, сс. 269–278
Аннотация The paper discusses certain problems of natural text parsing that emerge during the operation of a machine translation system. Emphasis is laid on adequacy of syntactic description of the working languages. It is claimed that no parser, however sophisticated, can be made completely free of lacunae. The reason is that many of the linguistic facts, critical for parser operation, have never come into view of researchers because they have not had at their disposal mass material of unexpected or incorrect parsing. It is exactly such material that is abundantly provided by the output of a highly developed NLP system. If handled properly, this material helps the researcher to locate the gaps of linguistic descriptions and eliminate them. Consequently, linguistic experimentation with NLP systems becomes a rightful and very promising scientific method. In a way, linguistic applications start to stimulate theoretical research, thus inverting the situation that has existed ever since NLP came to life. To substantiate this standpoint, a specific type of Russian copulative compound sentences is considered in detail. A new type of syntactic feature is introduced in order to adequately handle such sentences.
И.М.Богуславский, Л.Л.Иомдин, В.Г.Сизов, И.С.Чардин Использование размеченного корпуса текстов при автоматическом синтаксическом анализе . // Труды Международной конференции «Когнитивное моделирование в лингвистике-2003» // Варна, 2003
Аннотация Предлагается комбинированный алгоритм синтаксического анализа, используемый в лингвистическом процессоре ЭТАП-3 и, в первую очередь, в системе машинного перевода. При разрешении языковой неоднозначности составляющие ядро процессора эвристические правила динамически взаимодействуют со специально разработанным статистическим модулем, который на основе данных корпуса текстов с синтаксической разметкой приписывает веса гипотетическим синтаксическим связям. Для сбора корпусных данных были использованы русские тексты с синтаксической разметкой общим объемом в 6900 предложений (около 104000 слов). В ходе экспериментов по машинному переводу текстов с русского на английский язык с помощью данного комбинированного алгоритма выявлены локальные улучшения в работе лингвистического процессора, стимулирующие качественное развитие синтаксического анализатора и открывающие перед его разработчиками новые перспективы. В то же время количественное сравнение результатов работы комбинированного и эвристического алгоритмов синтаксического анализа не показало существенных различий в результатах их работы.