A Software System for Automatic Construction of a Semantic Word Network

Dmitry A. Ustalov, Andrey V. Sozykin

Abstract


A semantic word network is a network that represents the semantic relations between individual words or their lexical senses. In this paper, we present a software system for automatic construction of a semantic word network. The system, called SWN, is designed for the construction for such a semantic word network and includes the implementation of unsupervised concept discovery and semantic relation establishing methods as well as the implementation of a supervised relation expansion method. The methods use widely available language resources, such as semantic relation dictionaries and background text corpora. The domain model has been presented using the VOWL notation. The system architecture is represented using the UML notation and is composed of the concept discovery module, semantic relation construction module, the Semantic Web export module, and the evaluation dataset construction module based on microtask-based crowdsourcing. The present software system is open source and is available for integration into third-party data mining systems.

Keywords


semantic network; lexical semantics; software engineering; free software; Semantic Web; VOWL; UML

References


Gon¸calo Oliveira H., Gomes P. ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically. Language Resources and Evaluation. 2014. vol. 48, no. 2. pp. 373–393. DOI: 10.1007/s10579-013-9249-9.

Loukachevitch N.V. Tezaurusy v zadachakh informatsionnogo poiska [Thesauri in Information Retrieval Tasks]. Moscow, MSU Publishing, 2011. 512 pp.

Navigli R., Ponzetto S.P. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence. vol. 193. pp. 217–250. DOI: 10.1016/j.artint.2012.07.001.

UstalovD.A.JoiningDictionariesandWordEmbeddingsforOntologyInduction.Trudy pervoi nauchno-prakticheskoi Otkrytoi konferentsii ISP RAN [ProceedingsoftheOpenConferenceof the ISP RAS]. Moscow, ISP RAS, 2016. pp. 381–388. Available at: http://www.isprasopen. ru/files/conference.pdf (accessed: 29.12.2016).

Ustalov D.A. Obnaruzhenie ponyatii v grafe sinonimov [Concept Discovery from Synonymy Graphs]. Vychislitel’nye tekhnologii [Computational Technologies]. 2017. In press. Available at: http://depot.nlpub.ru/ustalov.jct2017.pdf (accessed: 18.02.2017).

Ustalov D.A. Postroenie semanticheskoi seti slov putem rasshireniya ierarkhicheskikh kontekstov [Expanding Hierarchical Contexts for Constructing a Semantic Word Network]. Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Po materialam ezhegodnoi Mezhdunarodnoi konferentsii «Dialog» (Moskva, 31 maya — 3 iyunya 2017 g.) [Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue” (Moscow, May 31–June 3, 2017)]. Moscow, RSUH, 2017. In press. Available at: http: //depot.nlpub.ru/ustalov.dialog2017.pdf (accessed: 10.04.2017).

Ustalov D.A., Arefyev N.V., Biemann C., Panchenko A.I. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, 2017, pp. 543–550. Available at: http: //aclweb.org/anthology/E/E17/E17-2087.pdf (accessed: 10.04.2017).

Berners-Lee T., Hendler J., Lassila O. The Semantic Web. Scientific American. 2001. vol. 284, no. 5. pp. 28–37. Available at: https://www.scientificamerican.com/article/ the-semantic-web/ (accessed: 10.03.2017). 9. Lohmann S. et al. Visualizing Ontologies with VOWL. Semantic Web. 2016. vol. 7, no. 4. pp. 399–419. DOI: 10.3233/SW-150200.

van Assem M. et al. A Method to Convert Thesauri to SKOS. 3rd European Semantic Web Conference, ESWC 2006 Budva, Montenegro, June 11-14, 2006 Proceedings. Springer Berlin Heidelberg, 2006. pp. 95–109. DOI: 10.1007/11762256_10.

McCrae J., Spohr D., Cimiano P. Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. The Semantic Web: Research and Applications: 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29-June 2, 2011, Proceedings, Part I. Springer Berlin Heidelberg, 2011. pp. 245–259. DOI: 10.1007/978-3-642-21034-1_17.

Ustalov D.A. Tezaurusy russkogo yazyka v vide otkrytykh svyazannykh dannykh [Russian Thesauri as Linked Open Data]. Komp’yuternaya lingvistika i intellektual’nye tekhnologii: Po materialam ezhegodnoi Mezhdunarodnoi konferentsii «Dialog» (Moskva, 27 — 30 maya 2015 g.) [Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue” (Moscow, May 27–30, 2015)]. Moscow, RSUH, 2015, pp. 616–625. Available at: http://www.dialog-21.ru/digests/dialog2015/materials/pdf/UstalovDA. pdf (accessed: 21.02.2017).

Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011. vol. 12. pp. 2825–2830. Available at: http://www.jmlr.org/papers/v12/ pedregosa11a.html (accessed: 07.03.2017).

Biemann C. Chinese Whispers: An Efficient Graph Clustering AlgorithmandIts Application to Natural Language Processing Problems. Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing. Association for Computational Linguistics, 2006. pp. 73–80. Available at: http://dl.acm.org/citation.cfm?id=1654774 (accessed: 15.03.2017).

van Dongen S. Graph Clustering by Flow Simulation. Ph.D. Thesis. University of Utrecht, 2000. Available at: https://dspace.library.uu.nl/handle/1874/848 (accessed: 27.03.2017).

Rehurek R., Sojka P. Software Framework for Topic Modelling with Large Corpora. New Challenges for NLP Frameworks Programme: AworkshopatLREC2010.European Language Resources Association, 2010. pp. 51–55. Available at: https://radimrehurek.com/gensim/ lrec2010_final.pdf (accessed: 03.04.2017).

Abadi M. et al. Tensor Flow: A System for Large-Scale Machine Learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, 2016. pp. 265-283. Available at: https://www.usenix.org/conference/osdi16/ technical-sessions/presentation/abadi (accessed: 10.04.2017).

Hagberg A.A., Schult D.A., Swart P.J. Exploring Network Structure, Dynamics, and Function using Network X. Proceedings of the 7th Python in Science Conference. 2008. pp. 11–15. Available at: http://conference.scipy.org/proceedings/scipy2008/paper_2/ (accessed: 05.12.2016).

Beckett D. The Design and Implementation of the Redland RDF Application Framework. Computer Networks. 2002. vol. 39, no. 5. pp. 577–588. DOI: 10.1016/S1389-1286(02)00221-9.

Korobov M. Morphological Analyzer and Generator for Russian and Ukrainian Languages. Analysis of Images, Social Networks and Texts: 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers. Springer International Publishing, 2015. pp. 320–332. DOI: 10.1007/978-3-319-26123-2_31.

Manning C.D., Raghavan P., Schutze H. Introduction to Information Retrieval. Cambridge University Press, 2008. 506 p.

Riedl M., Biemann C. Unsupervised Compound Splitting With Distributional Semantics Rivals Supervised Methods. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016. pp. 617–622. Available at: https://www. aclweb.org/anthology/N/N16/N16-1075.pdf (accessed: 16.02.2017).

Ustalov D.A. dustalov/watset: Concept Discovery from Synonymy Graphs. Available at: https://github.com/dustalov/watset (accessed: 10.04.2017). 24. Ustalov D.A. dustalov/watlink: Concept Linking. Available at: https://github.com/ dustalov/watlink (accessed: 10.04.2017).

Ustalov D.A. dustalov/projlearn: Learning Word Subsumption Projections. Available at: https://github.com/dustalov/projlearn (accessed: 10.04.2017).




DOI: http://dx.doi.org/10.14529/cmse170205