It will be built on the output of our existing morphological taggers. Issues pertaining to technology concern among other things parser development: A syntactic spoken language parser will be developed for Norwegian and Sami. On technological development: Improve the corpus search system Glossa to be able to handle hierarchical structures. For the first time, a Sami spoken language corpus will appear. All the types of linguistic material will also be made searchable in the corpus system Glossa. These will be made available and some of them annotated like the rest of the speech data. The project will digitise these recordings, and make a selection accessible by transcription linked to audio, grammatical tagging and simple annotation, plus searchable in a corpus.Ģ) Emigrant data: Field work in America has resulted in Norwegian-language recordings from as far back as 1931 till today. Most of the recordings are old and in an acute danger of being destroyed by bad storage conditions and demagnetisation. Issues pertaining to linguistics concern:ġ) Diachronic data: The dialect archives at Norwegian universities contain recordings of Norwegian and Sami language that have been built up over the last sixty years. The main goal of the LIA project is to rescue old and endangered important language recordings of Norwegian and Sami language, annotate them and make them accessible in an electronic database (corp us) for research in linguistics and technology. Corpus administrators can easily adapt the system to a wide range of corpora, including multilingual corpora and corpora with audio and video content. The Glossa system is already in use for a number of corpora. Further annotation and deletion of single results for further processing is also easy. Collocations can be viewed and counted in a number of ways, and be viewed as different kinds of graphical charts. The Glossa system also allows a wide range of viewing and post-processing options. Querying for more than one word is simply done by adding an additional query box, and for parts of words by choosing a feature such as "start of word". All searches are done using checkboxes, pull-down menus, or writing simple letters to make words or other strings. Furthermore, no previous knowledge of abbreviations for metavariables such as part of speech and source text is needed. Since corpus users are usually linguists with little interest in technical matters, we have developed a system where the user need not have any prior knowledge of the search system. We describe a web-based corpus query system, Glossa, which combines the expressiveness of regular query languages with the user-friendliness of a graphical interface. We show in the paper that both with the maps depicting corpus hits and with the maps depicting database results, the map visualizations actually show clear geographical differences that would be very difficult to spot just by reading concordance lines or database tables. With the map option, isoglosses are immediately visible. Searching for the evaluations of a particular sentence gives a list of several hundred judgments, which are difficult for a human researcher to assess. The database contains some hundred syntactic test sentences that have been evaluated by four speakers in more than hundred locations in Norway and Sweden. We have integrated Google Maps into these applications. 2009) and The Nordic Syntactic Judgments Database (Lindstad et al. We will illustrate with two different resources, into which we have now added a Google Maps application: The Nordic Dialect Corpus (Johannessen et al. By using maps, search results can be illustrated in a way that immediately gives the user information that words or numbers on their own would not give. We will look at how maps can be integrated in research resources, such as language databases and language corpora.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |