Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,175 @@
|
|
| 1 |
---
|
| 2 |
-
title: Lojban Camxes
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: gray
|
| 5 |
-
colorTo:
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
app_file: camxes.html
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Lojban Parser (Camxes JS)
|
| 3 |
+
emoji: 🌵
|
| 4 |
colorFrom: gray
|
| 5 |
+
colorTo: blue
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
app_file: camxes.html
|
| 10 |
---
|
| 11 |
|
| 12 |
+
la ilmentufa
|
| 13 |
+
=========
|
| 14 |
+
|
| 15 |
+
_[la ilmentufa](http://lojban.org/papri/la_ilmentufa)_ is a collection of formal grammars and syntactical parsers for the Lojban language, as well as related tools and interfaces.
|
| 16 |
+
|
| 17 |
+
It currently includes five main PEG formal grammars along with their corresponding Javascript parsers (those are automatically generated from the grammar files). The PEG grammar files have the extension `.peg` (e.g. `camxes.peg`), and the parsers have the same name as their corresponding grammar but with a `.js` extension.
|
| 18 |
+
|
| 19 |
+
* `camxes.peg`: Standard PEG grammar for Lojban.
|
| 20 |
+
* `camxes-beta.peg`: Same as camxes.peg, but with the addition of the most popular and backward-compatible experimental cmavo and grammar changes.
|
| 21 |
+
* `camxes-beta-cbm.peg`: Same as camxes-beta.peg, but with the Cmevla-Brivla Merger experimental grammar change.
|
| 22 |
+
* `camxes-beta-cbm-ckt.peg`: Same as above, but with [Ce-Ki-Tau](https://mw.lojban.org/papri/ce_ki_tau_jau) experimental grammar.
|
| 23 |
+
* `camxes-exp.peg`: Experimental grammar sandbox.
|
| 24 |
+
|
| 25 |
+
Main interfaces to the parsers:
|
| 26 |
+
* `camxes.html`: HTML interface with various parsing options and allowing selecting the desired parser.
|
| 27 |
+
* `glosser/glosser.htm`: Another HTML interface with different features, most prominently nested boxes output and glosses.
|
| 28 |
+
* `run_camxes.js`: Command line interface.
|
| 29 |
+
* `ircbot/camxes-bot.js`: IRC bot interface.
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
### Requirements ###
|
| 33 |
+
|
| 34 |
+
For generating a PEGJS grammar engine from its PEG grammar file, as well as for running the IRC bot interfaces, you need to have [Node.js](https://nodejs.org/) installed on your machine.
|
| 35 |
+
|
| 36 |
+
For generating a PEGJS engine, you may need to get the [Node.js module `pegjs`](http://pegjs.org/).
|
| 37 |
+
For running the IRC bots, you may need to get the [Node.js module `irc`](https://github.com/martynsmith/node-irc).
|
| 38 |
+
|
| 39 |
+
However, as the necessary `node_modules` are already included in this project, I think you'll probably not have to download any of the aforementioned modules. ;)
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
### Building a PEGJS parser ###
|
| 43 |
+
|
| 44 |
+
For generating a PEGJS parser from a `.peg` grammar file, after having set the Ilmentufa directory as the working directory, run the following commands (we'll take the `camxes.peg` grammar for the example):
|
| 45 |
+
|
| 46 |
+
```
|
| 47 |
+
nodejs pegjs_conv camxes.peg
|
| 48 |
+
nodejs build-camxes camxes.pegjs
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
(In some installations, the keyword ``nodejs`` doesn't work and should be replaced with ``node`` instead in the above commands.)
|
| 52 |
+
|
| 53 |
+
The first command (with `pegjs_conv`) converts the pure PEG grammar file (`*.peg`) to PEGJS format, creating or updating the file `camxes.pegjs` in this example.
|
| 54 |
+
The second command (with `build-camxes`) creates or updates the corresponding parser engine, `camxes.js` in this case.
|
| 55 |
+
|
| 56 |
+
Building the parser can take several dozen seconds.
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
### Generating CBM and CKT grammars ###
|
| 60 |
+
|
| 61 |
+
The current `camxes-beta-cbm.peg` and `camxes-beta-cbm-ckt.peg` are generated from `camxes-beta.peg` using a couple scripts.
|
| 62 |
+
Here's how to do:
|
| 63 |
+
|
| 64 |
+
```
|
| 65 |
+
nodejs std-to-cbm camxes-beta.peg
|
| 66 |
+
nodejs make-ckt camxes-beta-cbm.peg
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
### Running a parser from command line ###
|
| 71 |
+
|
| 72 |
+
Here's how to parse the Lojban text "coi ro do" with the standard grammar parser from command line:
|
| 73 |
+
|
| 74 |
+
```
|
| 75 |
+
nodejs run_camxes "coi ro do"
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
The standard grammar parser is used by default, but another grammar engine can be specified.
|
| 79 |
+
* The `-std` flag selects the standard grammar engine.
|
| 80 |
+
* The `-beta` flag selects the Beta grammar engine.
|
| 81 |
+
* The `-cbm` flag selects the Cmevla-Brivla Merger grammar engine.
|
| 82 |
+
* The `-ckt` flag selects the Ce-Ki-Tau grammar engine.
|
| 83 |
+
* The `-exp` flag selects the experimental or sandbox grammar engine.
|
| 84 |
+
* `-p PATH` can be used for selecting a parser by giving its file path as a command line argument.
|
| 85 |
+
|
| 86 |
+
Additionally, `-m MODE` can be used to specify output postprocessing options.
|
| 87 |
+
Here, MODE can be any letter string, each letter standing for a specific option.
|
| 88 |
+
Here is the list of possible letters and their associated meaning:
|
| 89 |
+
* M -> Keep morphology
|
| 90 |
+
* S -> Show spaces
|
| 91 |
+
* C -> Show word classes (selmaho)
|
| 92 |
+
* T -> Show terminators
|
| 93 |
+
* N -> Show main node labels
|
| 94 |
+
* R -> Raw output, do not prune the tree, except the morphology if 'M' not present.
|
| 95 |
+
* J -> JSON output
|
| 96 |
+
* G -> Replace words by glosses
|
| 97 |
+
* L -> Run the parser in a loop, consume every input line terminated by a newline and output parsed result
|
| 98 |
+
* L -> A second 'L' means that run_camxes will expect every input line to begin with a mode string (possibly empty) followed by a space, after which the actual input follows.
|
| 99 |
+
|
| 100 |
+
Example:
|
| 101 |
+
```
|
| 102 |
+
nodejs run_camxes -m CTN "coi ro do"
|
| 103 |
+
```
|
| 104 |
+
This will show terminators, selmaho and main node labels.
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
### Running the IRC bots ###
|
| 108 |
+
|
| 109 |
+
Nothing easier; after having entered the ilmentufa directory, run the command `nodejs ircbot/camxes-bot` or `nodejs ircbot/cipra-bot` (the latter is for the experimental grammar).
|
| 110 |
+
The list of the channels joined by the bot can be found and edited within the bot script.
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
### How to use one of these parsers in a HTML interface project ###
|
| 114 |
+
|
| 115 |
+
For using a Javascript Lojban parser in a HTML interface, you'll need to include the desired `.js` parser (e.g. `camxes.js`)
|
| 116 |
+
to your HTML interface.
|
| 117 |
+
You may also want including `camxes_preproc.js` and `camxes_postproc.js`, which provide useful features. The former does optional preprocessing of Lojban text, such as replacing digits with the corresponding PA cmavo, converting nonstandard spellings or scripts into normal Latin-based Lojban text. The latter script, the postprocessor, provides a function for trimming or prettifying the parse tree generated by the parser in function of the chosen postprocessing options.
|
| 118 |
+
|
| 119 |
+
Here's a simple example code:
|
| 120 |
+
|
| 121 |
+
```
|
| 122 |
+
<script type="text/javascript" src="camxes.js"></script>
|
| 123 |
+
<script type="text/javascript" src="camxes_preproc.js"></script>
|
| 124 |
+
<script type="text/javascript" src="camxes_postproc.js"></script>
|
| 125 |
+
<script>
|
| 126 |
+
function run_camxes(lojban_text) {
|
| 127 |
+
// We preprocess (if desired) the text using the function provided in camxes_preproc.js:
|
| 128 |
+
lojban_text = camxes_preprocessing(lojban_text);
|
| 129 |
+
// We run the Camxes parser and get the parse tree it generated:
|
| 130 |
+
try {
|
| 131 |
+
var parse_tree = camxes.parse(lojban_text);
|
| 132 |
+
} catch (err) {
|
| 133 |
+
DISPLAY(err.toString());
|
| 134 |
+
}
|
| 135 |
+
// We postprocess (if desired) the parse tree using the function provided in camxes_postproc.js:
|
| 136 |
+
var postproc_options = "CTN"; // Those are the same options as those used by run_camxes.js
|
| 137 |
+
var result = camxes_postprocessing(parse_tree, postproc_options);
|
| 138 |
+
// `result` is a string representation of the trimmed parse tree.
|
| 139 |
+
DISPLAY(result);
|
| 140 |
+
}
|
| 141 |
+
</script>
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
The postproc options are the sames as those of run_camxes.js described earlier in this file; please refer to the section `Running a parser from command line` above for more details.
|
| 145 |
+
|
| 146 |
+
You can also look into `camxes.html`'s code to see a real example of using the Lojban parsers in a HTML interface.
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
### Using a Javascript Lojban parser from another program or using a programming language other than Javascript ###
|
| 150 |
+
|
| 151 |
+
You can run a parser by making your program execute the following example command line:
|
| 152 |
+
|
| 153 |
+
```
|
| 154 |
+
nodejs run_camxes.js -m J camxes.js "jbobau vlamei"
|
| 155 |
+
```
|
| 156 |
+
(You can adapt it by providing other options flags described in the `Running a parser from command line` section above, but you'll want to keep the J option flag, so `run_camxes.js` outputs a JSON stringified array that will be easy for your program to read.)
|
| 157 |
+
|
| 158 |
+
`run_camxes.js` will then write the output parse tree in its `stdout`, so you'll need your program to read its `stdout` to get the parse result.
|
| 159 |
+
|
| 160 |
+
For example, in a Python script, you can execute run_camxes.js with the `subprocess` Python module, and then read its output this way:
|
| 161 |
+
|
| 162 |
+
```
|
| 163 |
+
import subprocess
|
| 164 |
+
try:
|
| 165 |
+
import simplejson as json
|
| 166 |
+
except (ImportError,):
|
| 167 |
+
import json
|
| 168 |
+
|
| 169 |
+
command = ['nodejs', 'run_camxes.js', '-m', 'J', 'jbobau vlamei']
|
| 170 |
+
output = subprocess.check_output(command)
|
| 171 |
+
json_string = output.decode("utf-8")
|
| 172 |
+
|
| 173 |
+
# Converting the JSON output back to nested lists:
|
| 174 |
+
parse_tree = json.loads(json_string)
|
| 175 |
+
```
|