Pendrokar commited on
Commit
d1f0b69
·
verified ·
1 Parent(s): 4467393

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +167 -4
README.md CHANGED
@@ -1,12 +1,175 @@
1
  ---
2
- title: Lojban Camxes Js
3
- emoji: 📚
4
  colorFrom: gray
5
- colorTo: indigo
6
  sdk: static
7
  pinned: false
8
  license: mit
9
  app_file: camxes.html
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Lojban Parser (Camxes JS)
3
+ emoji: 🌵
4
  colorFrom: gray
5
+ colorTo: blue
6
  sdk: static
7
  pinned: false
8
  license: mit
9
  app_file: camxes.html
10
  ---
11
 
12
+ la ilmentufa
13
+ =========
14
+
15
+ _[la ilmentufa](http://lojban.org/papri/la_ilmentufa)_ is a collection of formal grammars and syntactical parsers for the Lojban language, as well as related tools and interfaces.
16
+
17
+ It currently includes five main PEG formal grammars along with their corresponding Javascript parsers (those are automatically generated from the grammar files). The PEG grammar files have the extension `.peg` (e.g. `camxes.peg`), and the parsers have the same name as their corresponding grammar but with a `.js` extension.
18
+
19
+ * `camxes.peg`: Standard PEG grammar for Lojban.
20
+ * `camxes-beta.peg`: Same as camxes.peg, but with the addition of the most popular and backward-compatible experimental cmavo and grammar changes.
21
+ * `camxes-beta-cbm.peg`: Same as camxes-beta.peg, but with the Cmevla-Brivla Merger experimental grammar change.
22
+ * `camxes-beta-cbm-ckt.peg`: Same as above, but with [Ce-Ki-Tau](https://mw.lojban.org/papri/ce_ki_tau_jau) experimental grammar.
23
+ * `camxes-exp.peg`: Experimental grammar sandbox.
24
+
25
+ Main interfaces to the parsers:
26
+ * `camxes.html`: HTML interface with various parsing options and allowing selecting the desired parser.
27
+ * `glosser/glosser.htm`: Another HTML interface with different features, most prominently nested boxes output and glosses.
28
+ * `run_camxes.js`: Command line interface.
29
+ * `ircbot/camxes-bot.js`: IRC bot interface.
30
+
31
+
32
+ ### Requirements ###
33
+
34
+ For generating a PEGJS grammar engine from its PEG grammar file, as well as for running the IRC bot interfaces, you need to have [Node.js](https://nodejs.org/) installed on your machine.
35
+
36
+ For generating a PEGJS engine, you may need to get the [Node.js module `pegjs`](http://pegjs.org/).
37
+ For running the IRC bots, you may need to get the [Node.js module `irc`](https://github.com/martynsmith/node-irc).
38
+
39
+ However, as the necessary `node_modules` are already included in this project, I think you'll probably not have to download any of the aforementioned modules. ;)
40
+
41
+
42
+ ### Building a PEGJS parser ###
43
+
44
+ For generating a PEGJS parser from a `.peg` grammar file, after having set the Ilmentufa directory as the working directory, run the following commands (we'll take the `camxes.peg` grammar for the example):
45
+
46
+ ```
47
+ nodejs pegjs_conv camxes.peg
48
+ nodejs build-camxes camxes.pegjs
49
+ ```
50
+
51
+ (In some installations, the keyword ``nodejs`` doesn't work and should be replaced with ``node`` instead in the above commands.)
52
+
53
+ The first command (with `pegjs_conv`) converts the pure PEG grammar file (`*.peg`) to PEGJS format, creating or updating the file `camxes.pegjs` in this example.
54
+ The second command (with `build-camxes`) creates or updates the corresponding parser engine, `camxes.js` in this case.
55
+
56
+ Building the parser can take several dozen seconds.
57
+
58
+
59
+ ### Generating CBM and CKT grammars ###
60
+
61
+ The current `camxes-beta-cbm.peg` and `camxes-beta-cbm-ckt.peg` are generated from `camxes-beta.peg` using a couple scripts.
62
+ Here's how to do:
63
+
64
+ ```
65
+ nodejs std-to-cbm camxes-beta.peg
66
+ nodejs make-ckt camxes-beta-cbm.peg
67
+ ```
68
+
69
+
70
+ ### Running a parser from command line ###
71
+
72
+ Here's how to parse the Lojban text "coi ro do" with the standard grammar parser from command line:
73
+
74
+ ```
75
+ nodejs run_camxes "coi ro do"
76
+ ```
77
+
78
+ The standard grammar parser is used by default, but another grammar engine can be specified.
79
+ * The `-std` flag selects the standard grammar engine.
80
+ * The `-beta` flag selects the Beta grammar engine.
81
+ * The `-cbm` flag selects the Cmevla-Brivla Merger grammar engine.
82
+ * The `-ckt` flag selects the Ce-Ki-Tau grammar engine.
83
+ * The `-exp` flag selects the experimental or sandbox grammar engine.
84
+ * `-p PATH` can be used for selecting a parser by giving its file path as a command line argument.
85
+
86
+ Additionally, `-m MODE` can be used to specify output postprocessing options.
87
+ Here, MODE can be any letter string, each letter standing for a specific option.
88
+ Here is the list of possible letters and their associated meaning:
89
+ * M -> Keep morphology
90
+ * S -> Show spaces
91
+ * C -> Show word classes (selmaho)
92
+ * T -> Show terminators
93
+ * N -> Show main node labels
94
+ * R -> Raw output, do not prune the tree, except the morphology if 'M' not present.
95
+ * J -> JSON output
96
+ * G -> Replace words by glosses
97
+ * L -> Run the parser in a loop, consume every input line terminated by a newline and output parsed result
98
+ * L -> A second 'L' means that run_camxes will expect every input line to begin with a mode string (possibly empty) followed by a space, after which the actual input follows.
99
+
100
+ Example:
101
+ ```
102
+ nodejs run_camxes -m CTN "coi ro do"
103
+ ```
104
+ This will show terminators, selmaho and main node labels.
105
+
106
+
107
+ ### Running the IRC bots ###
108
+
109
+ Nothing easier; after having entered the ilmentufa directory, run the command `nodejs ircbot/camxes-bot` or `nodejs ircbot/cipra-bot` (the latter is for the experimental grammar).
110
+ The list of the channels joined by the bot can be found and edited within the bot script.
111
+
112
+
113
+ ### How to use one of these parsers in a HTML interface project ###
114
+
115
+ For using a Javascript Lojban parser in a HTML interface, you'll need to include the desired `.js` parser (e.g. `camxes.js`)
116
+ to your HTML interface.
117
+ You may also want including `camxes_preproc.js` and `camxes_postproc.js`, which provide useful features. The former does optional preprocessing of Lojban text, such as replacing digits with the corresponding PA cmavo, converting nonstandard spellings or scripts into normal Latin-based Lojban text. The latter script, the postprocessor, provides a function for trimming or prettifying the parse tree generated by the parser in function of the chosen postprocessing options.
118
+
119
+ Here's a simple example code:
120
+
121
+ ```
122
+ <script type="text/javascript" src="camxes.js"></script>
123
+ <script type="text/javascript" src="camxes_preproc.js"></script>
124
+ <script type="text/javascript" src="camxes_postproc.js"></script>
125
+ <script>
126
+ function run_camxes(lojban_text) {
127
+ // We preprocess (if desired) the text using the function provided in camxes_preproc.js:
128
+ lojban_text = camxes_preprocessing(lojban_text);
129
+ // We run the Camxes parser and get the parse tree it generated:
130
+ try {
131
+ var parse_tree = camxes.parse(lojban_text);
132
+ } catch (err) {
133
+ DISPLAY(err.toString());
134
+ }
135
+ // We postprocess (if desired) the parse tree using the function provided in camxes_postproc.js:
136
+ var postproc_options = "CTN"; // Those are the same options as those used by run_camxes.js
137
+ var result = camxes_postprocessing(parse_tree, postproc_options);
138
+ // `result` is a string representation of the trimmed parse tree.
139
+ DISPLAY(result);
140
+ }
141
+ </script>
142
+ ```
143
+
144
+ The postproc options are the sames as those of run_camxes.js described earlier in this file; please refer to the section `Running a parser from command line` above for more details.
145
+
146
+ You can also look into `camxes.html`'s code to see a real example of using the Lojban parsers in a HTML interface.
147
+
148
+
149
+ ### Using a Javascript Lojban parser from another program or using a programming language other than Javascript ###
150
+
151
+ You can run a parser by making your program execute the following example command line:
152
+
153
+ ```
154
+ nodejs run_camxes.js -m J camxes.js "jbobau vlamei"
155
+ ```
156
+ (You can adapt it by providing other options flags described in the `Running a parser from command line` section above, but you'll want to keep the J option flag, so `run_camxes.js` outputs a JSON stringified array that will be easy for your program to read.)
157
+
158
+ `run_camxes.js` will then write the output parse tree in its `stdout`, so you'll need your program to read its `stdout` to get the parse result.
159
+
160
+ For example, in a Python script, you can execute run_camxes.js with the `subprocess` Python module, and then read its output this way:
161
+
162
+ ```
163
+ import subprocess
164
+ try:
165
+ import simplejson as json
166
+ except (ImportError,):
167
+ import json
168
+
169
+ command = ['nodejs', 'run_camxes.js', '-m', 'J', 'jbobau vlamei']
170
+ output = subprocess.check_output(command)
171
+ json_string = output.decode("utf-8")
172
+
173
+ # Converting the JSON output back to nested lists:
174
+ parse_tree = json.loads(json_string)
175
+ ```