Juan Manuel Caicedo Carvajal is sharing code with you

Bitbucket is a code hosting site. Unlimited public and private repositories. Free for small teams.

Don't show this again

cavorite / spanish-wordnet

Tools and resources for the Spanish version of WordNet.

Clone this repository (size: 22.2 KB): HTTPS / SSH
hg clone https://bitbucket.org/cavorite/spanish-wordnet
hg clone ssh://hg@bitbucket.org/cavorite/spanish-wordnet

spanish-wordnet overview

Recent commits See more »

Author Revision Comments Message Labels Date
Juan Manuel Caicedo Carvajal afee5f3b86ad Updated docs.
Juan Manuel Caicedo Carvajal cb54cec6f15c Repository configuration.
Juan Manuel Caicedo Carvajal 42e563a13699 Script for importing the translations into a database.
Juan Manuel Caicedo Carvajal aa72df31dd5b Initial version.
===============
Spanish WordNet
===============

Tools for converting the Spanish translation of WordNet to the file formats
used for the English version.

Why:
  Because this allows developers and researchers to use the WordNet database with 
  the tools available for different platforms and programming languages.

How:
  By converting the XML files of the Spanish translation of WordNet into the
  common file formats.

Note:
  This is a work in progress.

Last update:
  2012-01-10

Ideas
=====

1. Download the English WordNet and edit it with extJWNL, replacing the words
   with the translations.

   - Start from the attribute wn_synset.spa for renaming the lemma.
     http://www.zentus.com/sqlitejdbc/
   
   - Translate the gloss word by word:
        wn_gloss
        wn_trad
        wn_sk (translation for verbs, nouns, adj)
        wn_variants

Notes
-----


- I wrote a script to `tools/dbimport.py` to create  a database from the XML 
  files with the Spanish translations. The database contains one table per file,
  one row for each 'row' element and using the attributes of the 'row' element 
  as columns. 
  The script is written in Python and uses SqlAlchemy as ORM.

- Several tables refer to WN elements using the sense keys. They are described
  in the WN documentation:

    http://wordnet.princeton.edu/man/senseidx.5WN.html

  NLTK supports looking up a lemma from the sense key using the method:

    wordnet.lemma_from_key('.22-caliber%3:01:00::')

  Here is the documentation of the NLTK WN module:

  http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html


- extJwnl, the Java library for WN, provides a command line tool that can run a
  script with the modifications to the database:

    usage edit: ewn -script filename
            filename contains edit commands as above, one sensekey per line. 
            For example:

        goal%1:09:00:: -add -addword end -setgloss "the state ... achieve it; ""the ends justify the means"""
        n#oxen -addexc ox


- The class Dictionary from extJwnl provides methods to get a `Word` object by
  the sense key (`getWordBySenseKey`) and to edit the dictionary (`edit`, `save`,
  `close`). These methods could be used to update the synset glosses with the
  Spanish translations.


References
==========

extJWNL
http://extjwnl.sourceforge.net/

Spanish translation of WordNet
http://grial.uab.es/tools/download/


License and Authors
===================
See LICENSE.txt and AUTHORS.txt.


Contact
=======

Juan Manuel Caicedo
http://cavorite.com
juan@cavorite.com