From a0242f55a2de936d77dbd54184d949255341db53 Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Thu, 13 Nov 2014 13:58:57 -0500 Subject: org files related to sisu, break up and place in own subdir --- data/doc/sisu/org/sisu.org | 680 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 680 insertions(+) create mode 100644 data/doc/sisu/org/sisu.org (limited to 'data/doc/sisu/org/sisu.org') diff --git a/data/doc/sisu/org/sisu.org b/data/doc/sisu/org/sisu.org new file mode 100644 index 00000000..125cf9ae --- /dev/null +++ b/data/doc/sisu/org/sisu.org @@ -0,0 +1,680 @@ +#+PRIORITIES: A F E +(emacs:evil mode gifts a "vim" of enticing "alternative" powers! ;) + +* General + +** what is sisu? + +Multiple output formats with a nod to the strengths of each output format and +the ability to cite text easily across output formats. + +*** debian/control desc + +documents - structuring, publishing in multiple formats and search + SiSU is a lightweight markup based, command line oriented, document + structuring, publishing and search framework for document collections. + . + With minimal preparation of a plain-text (UTF-8) file, using sisu markup + syntax in your text editor of choice, SiSU can generate various document + formats (most of which share a common object numbering system for locating + content), including plain text, HTML, XHTML, XML, EPUB, OpenDocument text + (ODF:ODT), LaTeX, PDF files, and populate an SQL database with objects + (roughly paragraph-sized chunks) so searches may be performed and matches + returned with that degree of granularity. Think being able to finely match + text in documents, using common object numbers, across different output + formats and across languages if you have translations of the same document. + For search, your criteria is met by these documents at these locations within + each document (equally relevant across different output formats and + languages). To be clear (if obvious) page numbers provide none of this + functionality. Object numbering is particularly suitable for "published" works + (finalized texts as opposed to works that are frequently changed or updated) + for which it provides a fixed means of reference of content. Document outputs + can also share provided semantic meta-data. + . + SiSU also provides concordance files, document content certificates and + manifests of generated output. SiSU provides the means to make book indexes + that make use of its object numbering. + . + A vim syntax highlighting file and an ftplugin with folds for sisu markup is + provided. Vim 7 includes syntax highlighting for SiSU. Some syntax hilighting + is also available for Emacs and a few other editors. + . + Dependencies for various features are taken care of in sisu related packages. + The package sisu-complete installs the whole of SiSU. + . + Additional document markup samples are provided in the package + sisu-markup-samples which is found in the non-free archive. The licenses for + the substantive content of the marked up documents provided is that provided + by the author or original publisher. + . + SiSU uses utf-8 & parses left to right. Currently supported languages: + am bg bn br ca cs cy da de el en eo es et eu fi fr ga gl he hi hr hy ia is it + ja ko la lo lt lv ml mr nl nn no oc pl pt pt_BR ro ru sa se sk sl sq sr sv ta + te th tk tr uk ur us vi zh (see XeTeX polyglossia & cjk) + . + SiSU works well under po4a translation management, for which an administrative + sample Rakefile is provided with sisu_manual under markup-samples. j + +*** multiple document formats + +Text can be represented in multiple output formats with different +characteristics that are (or may be) regarded as strengths/advantages and +therefore preferred in different contexts. + +Given the different strengths and characteristics of various output formats, it +makes little sense to try too hard to make different representations of a +document look the same. More interesting is have document representations that +take advantage of each given outputs strengths. As valuable if not more so is +the ability to cite, find, discuss text with ease, across the different output +formats. + +For citation across output formats, SiSU uses object citation numbers. + +*** document structure and document objects + +SiSU breaks marked up text into document structure and objects + +Document structure being the document heading hierarchy (having separated out +the document header). + +**** What are document objects? +An object is an identified meaningful unit of a document, most commonly a +paragraph of text, but also for example a table, code block, verse or image. + +SiSU tracks these substantive document units as document objects (and their +relationship to the document structure). + +*** object citation numbers + +**** What are object citation numbers? + +An object citation number is a sequential number assigned to a document object. + +In sisu output documents share this common object numbering system (dubbed +"object citation numbering" (ocn)) that is meaningful (machine & human readable) +across various digital outputs whether paper, screen, or database oriented, +(PDF, html, XML, EPUB, sqlite, postgresql), and across multilingual content if +prepared appropriately. This numbering system can be used to reference content +across output types. + +**** Why might I want object citation numbering? + +The ability to cite and quickly locate text can be invaluable if not essential. + (whether for instruction or discussion). + +In this digital & Internet age we have multiple ways to represent documents and +multiple document output formats as options with different characteristics, +strengths/advantages etc. We need a way to cite text that works and is relevant +independent of the document format used. + +I want to discuss (cite) html text how do I do this? +how do I refer to / cite / discuss text in html? +Issue: html may be viewed online or printed, it is not tied to paper (as +e.g. pdf) and prints differently depending on selected font face and font size. + +I want to discuss (cite) text that is available in multiple formats (e.g. pdf, +epub, html) without having to worry about the output format that is referred +to. +How do I refer to / discuss text that is available in more than one format, +uncertain of what format is preferred, used or available to my colleagues? +e.g. html and epub or pdf have rather different text representations, how do I +discuss ... + +I would like to have a book index that is relevant (can be used) across multiple +output formats (e.g. pdf, epub, html) + +How do I make a book index (or a concordance file) that works across multiple +output formats? + +I would like to have search results indicating where in a document matches are +found and I would like it to be relevant across available output formats (e.g. +pdf, epub, html) +How do I get search results for locations of text within each relevant document + +I would like to be able to discuss a text that has been translated ... +how do I find text across languages? +Where I have a nicely translated document, how do I point to or discuss with my +foreign language counterpart some detail of the text, or, how do I point my +foreign language counterpart to the text I would like to bring to his +attention. + +*** "Granular" Search + +Of interest is the ease of streaming documents to a relational database, at an +object (roughly paragraph) level and the potential for increased precision in +the presentation of matches that results thereby. The ability to serialize +html, LaTeX, XML, SQL, (whatever) is also inherent in / incidental to the +design. + +*** Summary +SiSU information Structuring Universe +Structured information, Serialized Units or + software for electronic texts, document collections, +books, digital libraries, and search, with "atomic search" and text positioning +system (shared text citation numbering: "ocn") +outputs include: plaintext, html, XHTML, XML, ODF (OpenDocument), EPUB, LaTeX, +PDF, SQL (PostgreSQL and SQLite) + +*** SiSU Short Description + +SiSU is a comprehensive future-proofing electronic document management system. +Built-in search capabilities allow you to search across multiple documents and +highlight matches in an easy-to-follow format. Paragraph numbering system +allows you to cite your electronic documents in a consistent manner across +multiple file formats. Multiple format outputs allow you to display your +documents in plain text, PDF (portrait and horizontal), OpenDocument format, +HTML, or e-book reading format (EPUB). Word mapping allows you to easily create +word indexes for your documents. Future-proofing flexibility allows you to +quickly adapt your documents to newer output formats as needed. All these and +many other features are achieved with little or no additional work on your +documents - by marking up the documents with a super simplistic markup +language, leaving the SiSU engine to handle the heavy-lifting processing. + +Potential users of SiSU include individual authors who want to publish their +books or articles electronically to reach a broad audience, web publishers who +want to provide multiple channels of access to their electronic documents, or +any organizations which centrally manage a medium or large set of electronic +documents, especially governmental organizations which may prefer to keep their +documents in easily accessible yet non-proprietary formats. + +SiSU is an Open Source project initiated and led by Ralph Amissah + and can be contacted via mailing list + at . SiSU is +licensed under the GNU General Public License. + +**** notes + +For less markup than the most elementary HTML you can have more. SiSU - +Structured information, Serialized Units for electronic documents, is an +information structuring, transforming, publishing and search framework with the +following features: + +(i) markup syntax: (a) simpler than html, (b) mnemonic, influenced by +mail/messaging/wiki markup practices, (c) human readable, and easily writable, + +(ii) (a) minimal markup requirement, (b) single file marked up for multiple outputs, + + * documents are prepared in a single UTF-8 file using a minimalistic mnemonic +syntax. Typical literature, documents like "War and Peace" require almost no +markup, and most of the headers are optional. + + * markup is easily readable/parsed by the human eye, (basic markup is simpler +and more sparse than the most basic html), [this may also be converted to XML +representations of the same input/source document]. + + * markup defines document structure (this may be done once in a header +pattern-match description, or for heading levels individually); basic text +attributes (bold, italics, underscore, strike-through etc.) as required; and +semantic information related to the document (header information, extended +beyond the Dublin core and easily further extended as required); the headers +may also contain processing instructions. + +(iii) (a) multiple output formats, including amongst others: plaintext (UTF-8); +html; (structured) XML; ODF (Open Document text); EPUB; LaTeX; PDF (via LaTeX); +SQL type databases (currently PostgreSQL and SQLite). SiSU produces: +concordance files; document content certificates (md5 or sha256 digests of +headings, paragraphs, images etc.) and html manifests (and sitemaps of +content). (b) takes advantage of the strengths implicit in these very different +output types, (e.g. PDFs produced using typesetting of LaTeX, databases +populated with documents at an individual object/paragraph level, making +possible granular search (and related possibilities)) + +(iv) outputs share a common numbering system (dubbed "object citation +numbering" (ocn)) that is meaningful (to man and machine) across various +digital outputs whether paper, screen, or database oriented, (PDF, html, XML, +EPUB, sqlite, postgresql), this numbering system can be used to reference +content. + +(v) SQL databases are populated at an object level (roughly headings, +paragraphs, verse, tables) and become searchable with that degree of +granularity, the output information provides the object/paragraph numbers which +are relevant across all generated outputs; it is also possible to look at just +the matching paragraphs of the documents in the database; [output indexing also +work well with search indexing tools like hyperesteier]. + +(vi) use of semantic meta-tags in headers permit the addition of semantic +information on documents, (the available fields are easily extended) + +(vii) creates organised directory/file structure for (file-system) output, +easily mapped with its clearly defined structure, with all text objects +numbered, you know in advance where in each document output type, a bit of text +will be found (e.g. from an SQL search, you know where to go to find the +prepared html output or PDF etc.)... there is more; easy directory management +and document associations, the document preparation (sub-)directory may be used +to determine output (sub-)directory, the skin used, and the SQL database used, + +(viii) "Concordance file" wordmap, consisting of all the words in a document +and their (text/ object) locations within the text, (and the possibility of +adding vocabularies), + +(ix) document content certification and comparison considerations: (a) the +document and each object within it stamped with an sha256 hash making it +possible to easily check or guarantee that the substantive content of a document +is unchanged, (b) version control, documents integrated with time based source +control system, default RCS or CVS with use of $Id$ tag, which SiSU checks + +(x) SiSU's minimalist markup makes for meaningful "diffing" of the substantive +content of markup-files, + +(xi) easily skinnable, document appearance on a project/site wide, directory +wide, or document instance level easily controlled/changed, + +(xii) in many cases a regular expression may be used (once in the document +header) to define all or part of a documents structure obviating or reducing +the need to provide structural markup within the document, + +(xiii) prepared files may be batch process, documents produced are static files +so this needs to be done only once but may be repeated for various reasons as +desired (updated content, addition of new output formats, updated technology +document presentations/representations) + +(xiv) possible to pre-process, which permits: the easy creation of standard +form documents, and templates/term-sheets, or; building of composite documents +(master documents) from other sisu marked up documents, or marked up parts, +i.e. import documents or parts of text into a main document should this be +desired + +there is a considerable degree of future-proofing, output representations are +"upgradeable", and new document formats may be added. + +(xv) there is a considerable degree of future-proofing, output representations +are "upgradeable", and new document formats may be added: (a) modular, (thanks +in no small part to Ruby) another output format required, write another +module.... (b) easy to update output formats (eg html, XHTML, LaTeX/PDF +produced can be updated in program and run against whole document set), (c) +easy to add, modify, or have alternative syntax rules for input, should you +need to, + +(xvi) scalability, dependent on your file-system (ext3, Reiserfs, XFS, +whatever) and on the relational database used (currently Postgresql and +SQLite), and your hardware, + +(xvii) only marked up files need be backed up, to secure the larger document +set produced, + +(xviii) document management, + +(xix) Syntax highlighting for SiSU markup is available for a number of text +editors. + +(xx) remote operations: (a) run SiSU on a remote server, (having prepared sisu +markup documents locally or on that server, i.e. this solution where sisu is +installed on the remote server, would work whatever type of machine you chose +to prepare your markup documents on), (b) generated document outputs may be +posted by sisu to remote sites (using rsync/scp) (c) document source (plaintext +utf-8) if shared on the net may be identified by its url and processed locally +to produce the different document outputs. + +(xxi) document source may be bundled together (automatically) with associated +documents (multiple language versions or master document with inclusions) and +images and sent as a zip file called a sisupod, if shared on the net these too +may be processed locally to produce the desired document outputs, these may be +downloaded, shared as email attachments, or processed by running sisu against +them, either using a url or the filename. + +(xxii) for basic document generation, the only software dependency is Ruby, and +a few standard Unix tools (this covers plaintext, html, XML, ODF, EPUB, LaTeX). +To use a database you of course need that, and to convert the LaTeX generated +to PDF, a LaTeX processor like tetex or texlive. + +as a developers tool it is flexible and extensible + +*** description + +SiSU ("SiSU information Structuring Universe" or "Structured information, +Serialized Units"),1 is a Unix command line oriented framework for document +structuring, publishing and search. Featuring minimalistic markup, multiple +standard outputs, a common citation system, and granular search. Using markup +applied to a document, SiSU can produce plain text, HTML, XHTML, XML, +OpenDocument, LaTeX or PDF files, and populate an SQL database with objects2 +(equating generally to paragraph-sized chunks) so searches may be performed and +matches returned with that degree of granularity (e.g. your search criteria is +met by these documents and at these locations within each document). Document +output formats share a common object numbering system for locating content. +This is particularly suitable for "published" works (finalized texts as opposed +to works that are frequently changed or updated) for which it provides a fixed +means of reference of content. How it works + +SiSU markup is fairly minimalistic, it consists of: a (largely optional) +document header, made up of information about the document (such as when it was +published, who authored it, and granting what rights) and any processing +instructions; and markup within text which is related to document structure and +typeface. SiSU must be able to discern the structure of a document, (text +headings and their levels in relation to each other), either from information +provided in the instruction header or from markup within the text (or from a +combination of both). Processing is done against an abstraction of the document +comprising of information on the document's structure and its objects,2 which +the program serializes (providing the object numbers) and which are assigned +hash sum values based on their content. This abstraction of information about +document structure, objects, (and hash sums), provides considerable flexibility +in representing documents different ways and for different purposes (e.g. +search, document layout, publishing, content certification, concordance etc.), +and makes it possible to take advantage of some of the strengths of established +ways of representing documents, (or indeed to create new ones). + +1. also chosen for the meaning of the Finnish term "sisu". + +2 objects include: headings, paragraphs, verse, tables, images, but not +footnotes/endnotes which are numbered separately and tied to the object from +which they are referenced. + +More information on SiSU provided at: + +SiSU was developed in relation to legal documents, and is strong across a wide +variety of texts (law, literature...(humanities, law and part of the social +sciences)). SiSU handles images but is not suitable for formulae/ statistics, +or for technical writing at this time. + +SiSU has been developed and has been in use for several years. Requirements to +cover a wide range of documents within its use domain have been explored. + + + + + +2010 +w3 since October 3 1993 +** Finding +*** source +http://git.sisudoc.org/gitweb/ + +sisu git repo: +http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary + +sisu-markup-samples git repo: +http://git.sisudoc.org/gitweb/?p=doc/sisu-markup-samples.git;a=summary + +*** mailing list +sisu at lists.sisudoc.org +http://lists.sisudoc.org/listinfo/sisu + +** irc oftc #sisu + +** home pages + + + + +** Installing sisu + +*** where you take responsibility for having the correct dependencies + +Provided you have *Ruby*, *SiSU* can be run. + +SiSU should be run from the directory containing your sisu marked up document +set. + +This works fine so long as you already have sisu external dependencies in +place. For many operations such as html, epub, odt this is likely to be fine. +Note however, that additional external package dependencies, such as texlive +(for pdfs), sqlite3 or postgresql (for search) should you desire to use them +are not taken care of for you. + +**** run off the source tarball without installation + +RUN OFF SOURCE PACKAGE DIRECTORY TREE (WITHOUT INSTALLING) +.......................................................... + +***** 1. Obtain the latest sisu source + +using git: + +http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary +http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=log + + git clone git://git.sisudoc.org/git/code/sisu.git + +or, identify latest available source: + +https://packages.debian.org/sid/sisu +http://packages.qa.debian.org/s/sisu.html +http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org + +http://sisudoc.org/sisu/archive/pool/main/s/sisu/ + +and download the: + + sisu_5.4.5.orig.tar.xz + +using debian tool dget: + +The dget tool is included within the devscripts package +https://packages.debian.org/search?keywords=devscripts +to install dget install devscripts: + + apt-get install devscripts + +and then you can get it from Debian: + dget -xu http://ftp.fi.debian.org/debian/pool/main/s/sisu/sisu_5.4.5-1.dsc + +or off sisu repos + dget -x http://www.jus.uio.no/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc +or + dget -x http://sisudoc.org/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc + +***** 2. Unpack the source + +Provided you have *Ruby*, *SiSU* can be run without installation straight from +the source package directory tree. + +Run ruby against the full path to bin/sisu (in the unzipped source package +directory tree). SiSU should be run from the directory containing your sisu +marked up document set. + + ruby ~/sisu-5.4.5/bin/sisu --html -v document_name.sst + +This works fine so long as you already have sisu external dependencies in +place. For many operations such as html, epub, odt this is likely to be fine. +Note however, that additional external package dependencies, such as texlive +(for pdfs), sqlite3 or postgresql (for search) should you desire to use them +are not taken care of for you. + +**** gem install (with rake) + +(i) create the gemspec; (ii) build the gem (from the gemspec); (iii) install +the gem + +Provided you have ruby & rake, this can be done with the single command: + + rake gem_create_build_install + +to build and install sisu v5 & sisu v6, alias gemcbi + +separate gems are made/installed for sisu v5 & sisu v6 contained in source. + +to build and install sisu v5, alias gem5cbi: + + rake gem_create_build_install_stable + +to build and install sisu v6, alias gem6cbi: + + rake gem_create_build_install_unstable + +for individual steps (create, build, install) see rake options, rake -T to +specify sisu version for sisu installed via gem + + gem search sisu + + sisu _5.4.5_ --version + + sisu _6.0.11_ --version + +to uninstall sisu installed via gem + + sudo gem uninstall --verbose sisu + +For a list of alternative actions you may type: + + rake help + + rake -T + +Rake: + +**** installation with setup.rb + +this is a three step process, in the root directory of the unpacked *SiSU* as +root type: + +ruby setup.rb config +ruby setup.rb setup +#[as root:] +ruby setup.rb install + +further information: + + + + ruby setup.rb config && ruby setup.rb setup && sudo ruby setup.rb install + +*** Debian install + +*SiSU* is available off the *Debian* archives. It should necessary only to run +as root, Using apt-get: + + apt-get update + + apt get install sisu-complete + +(all sisu dependencies should be taken care of) + +If there are newer versions of *SiSU* upstream, they will be available by +adding the following to your sources list /etc/apt/sources.list + +#/etc/apt/sources.list + +deb http://www.jus.uio.no/sisu/archive unstable main non-free +deb-src http://www.jus.uio.no/sisu/archive unstable main non-free + +The non-free section is for sisu markup samples provided, which contain +authored works the substantive text of which cannot be changed, and which as a +result do not meet the debian free software guidelines. + +*SiSU* is developed on *Debian*, and packages are available for *Debian* that +take care of the dependencies encountered on installation. + +The package is divided into the following components: + + *sisu*, the base code, (the main package on which the others depend), without + any dependencies other than ruby (and for convenience the ruby webrick web + server), this generates a number of types of output on its own, other + packages provide additional functionality, and have their dependencies + + *sisu-complete*, a dummy package that installs the whole of greater sisu as + described below, apart from sisu -examples + + *sisu-pdf*, dependencies used by sisu to produce pdf from /LaTeX/ generated + + *sisu-postgresql*, dependencies used by sisu to populate postgresql database + (further configuration is necessary) + + *sisu-sqlite*, dependencies used by sisu to populate sqlite database + + *sisu-markup-samples*, sisu markup samples and other miscellany (under + *Debian* Free Software Guidelines non-free) + + *SiSU* is available off Debian Unstable and Testing [link: + ] + [^1] install it using apt-get, aptitude or alternative *Debian* install tools. + +** sisu markup :sisu:markup: + +*** sisu markup + +#% structure - headings, levels + * headings (A-D, 1-3) + * inline + 'A~ ' NOTE title level + 'B~ ' NOTE optional + 'C~ ' NOTE optional + 'D~ ' NOTE optional + '1~ ' NOTE chapter level + '2~ ' NOTE optional + '3~ ' NOTE optional + '4~ ' NOTE optional :consider: + * node + * parent + * children + +#% font face NOTE open & close marks, inline within paragraph + * emphasize '*{ ... }*' NOTE configure whether bold italics or underscore, default bold + * bold '!{ ... }!' + * italics '/{ ... }/' + * underscore '_{ ... }_' + * superscript '^{ ... }^' + * subscript ',{ ... },' + * strike '-{ ... }-' + * add '+{ ... }+' + * monospace '#{ ... }#' +#% para NOTE paragraph controls are at the start of a paragraph + * a para is a block of text separated from others by an empty line + * indent + * default, all '_1 ' up to '_9 ' + * first line hang '_1_0 ' + * first line indent further '_0_1 ' + * bullet + [levels 1-6] + '_* ' + '_1* ' + '_2* ' + * numbered list + [levels 1-3] + '# ' + +#% blocks NOTE text blocks that are not to be treated in the way that ordinary paragraphs would be + * code + * [type of markup if any] + * poem + * group + * alt + * tables +#% boxes + NOTE grouped text with code block type color & possibly default image, warning, tip, red, blue etc. decide [NB N/A not implemented] + +#% notes NOTE inline within paragraph at the location where the note reference is to occur + * footnotes '~{ ... }~' + * [bibliography] [NB N/A not implemented] + +#% links, linking + * links - external, web, url + * links - internal + +#% images [multimedia?] + * images + * [base64 inline] [N/A not implemented] + +#% object numbers + * ocn (object numbers) + automatically attributed to substantive objects, paragraphs, tables, blocks, verse (unless exclude marker provided) + +#% contents + * toc (table of contents) + autogenerated from structure/headings information + * index (book index) + built from hints in newline text following a paragraph and starting with ={} has identifying rules for main and subsidiary text + +#% breaks + * line break ' \\ ' inline + * page break, column break ' -\\- ' start of line, breaks a column, starts a new column, if using columns, else breaks the page, starts a new page. + * page break, page new ' =\\= ' start of line, breaks the page, starts a new page. + * horizontal '-..-' start of line, rule page (break) line across page (dividing paragraphs) + +#% book type index + +#% comment + * comment + +#% misc + * term & definition + +*** syntax hilighting + +**** vim +data/sisu/conf/editor-syntax-etc/vim/ +data/sisu/conf/editor-syntax-etc/vim/syntax/sisu.vim + +**** emacs +data/sisu/conf/editor-syntax-etc/emacs/ +data/sisu/conf/editor-syntax-etc/emacs/sisu-mode.el +** todo +sisu_todo.org -- cgit v1.2.3