Subject: Re: HTML->DocBook? (Re: HTML -> RTF) From: Oisin McGuinness <oisin@xxxxxxxx> Date: Wed, 24 May 2000 12:11:47 -0400 |
Since Gary indicated deja.com had no archives for last year for comp.text.sgml, I'm posting a copy of the posting of?Christopher Browne last year. Please excuse the length, it seems to be wanted... Everything between <quote> and </quote> is his. I have not tested this extensively. I didn't find any newer version of this on his web site (in the .sig at the end). Oisin McGuinness oisin@xxxxxxxx <quote> >From - Fri Jun 11 12:57:55 1999 Xref: news2.new-york.net comp.text.sgml:15395 Path: news2.new-york.net!news.new-york.net!news-peer.gip.net!news.gsl.net!gip.net!newsfeed.cwix.com!207.207.0.27!nntp2.giganews.com!news2.giganews.com.POSTED!cbbrowne From: cbbrowne@xxxxxxxxxxxx (Christopher Browne) Newsgroups: comp.text.sgml Subject: Re: Search for Holy Grail: {html,ps,text}2sgml References: <7jobbo$2urc$1@xxxxxxx> <7joh0t$337$2@xxxxxxxxxxxxxxxxxxxx> <7joif7$ok$1@xxxxxxx> Reply-To: cbbrowne@xxxxxxx X-Newsreader: slrn (0.9.5.1 Windows) Lines: 188 Message-ID: <TBY73.3916$_m4.78408@xxxxxxxxxxxxxxxxxx> NNTP-Posting-Date: Thu, 10 Jun 1999 19:18:59 CDT Organization: Giganews.Com - Premium News Outsourcing X-Trace: sv1-sR9dUz5KKnfUgtyW/83eo35REq2btQPNoz54fJRvop128On8quxwsI2oWRUyCF7C9EL954AEHFrduWO!UEMnKV2D7tk= X-Complaints-To: abuse@xxxxxxxxxxxx X-Abuse-Info: Please be sure to forward a copy of ALL headers X-Abuse-Info: Otherwise we will be unable to process your complaint properly Date: Fri, 11 Jun 1999 00:18:59 GMT On 10 Jun 1999 14:35:19 GMT, Marc G. Fournier <scrappy@xxxxxxx> wrote: >jdassen@xxxxxxxxxxxxxxxx (J.H.M. Dassen (Ray)) writes: >>Marc G. Fournier <scrappy@xxxxxxx> wrote: >>> I swear, I'm searching for the Holy Grail here, its about as >>>impossible...:) > >>It is impossible, in any meaningful sense. > >>SGML is about document structure. PostScript, plain ASCII and to some >>degree > >How is it that the html2sgml(linuxdoc) converter works then? It's impossible to come up a provably complete system that will make the SGML document "colloquial" for its DTD. I use the DSSSL listed below to turn HTML that uses a small subset of the available HTML tags into something that's pretty easy to integrate into DocBook. It's useful enough for expressing the very limited structuring that HTML provides, essentially being aware of: a) Headings <H1>, <H2>, ... b) Paragraphs c) Some modifiers (<TT>, <B>) d) Itemized lists e) URLs That's a tiny subset of HTML, and is mapped onto a tiny subset of what DocBook offers. It happens to be enough to be fairly useful. But I'd not call it a complete "conversion." And to convert documents (say) in Postscript, where it may not even be possible to group more than lines of text together, into SGML *or any other structured system* is nigh unto impossible. <!doctype style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN"> (define debug (external-procedure "UNREGISTERED::James Clark//Procedure::debug")) (declare-flow-object-class element "UNREGISTERED::James Clark//Flow Object Class::element") (declare-flow-object-class empty-element "UNREGISTERED::James Clark//Flow Object Class::empty-element") (declare-flow-object-class document-type "UNREGISTERED::James Clark//Flow Object Class::document-type") (declare-flow-object-class processing-instruction "UNREGISTERED::James Clark//Flow Object Class::processing-instruction") (declare-characteristic preserve-sdata? "UNREGISTERED::James Clark//Characteristic::preserve-sdata?" #t) (define (copy-attributes #!optional (nd (current-node))) (let loop ((atts (named-node-list-names (attributes nd)))) (if (null? atts) '() (let* ((name (car atts)) (value (attribute-string name nd))) (if value (cons (list name value) (loop (cdr atts))) (loop (cdr atts))))))) (default (if (node-property 'momitend (current-node)) (make empty-element attributes: (copy-attributes)) (make element attributes: (copy-attributes)))) (element HTML (make sequence (make document-type name: "ARTICLE" public-id: "-//Davenport//DTD DocBook V3.0//EN") (process-children))) (element article (make element)) (element title (make element)) (element head (make element gi: "Artheader")) (element BODY (make element gi: "Para")) (element h1 (make element gi: "Sect1" )) (element h2 (make element gi: "Sect2" )) (element h3 (make element gi: "Sect3" )) (element h4 (make element gi: "Sect4" )) (element h5 (make element gi: "Sect5" )) (element heading (make element gi: "Title")) (element p (make element gi: "Para")) (element tt (make element gi: "Literal" attributes: `(("remap" "tt")))) ;; fixme (element tscreen (process-children)) ; FIXME (element ul (make element gi: "ItemizedList")) (element li (make element gi: "ListItem" (make element gi: "Para"))) (element URL (make element gi: "ULink" attributes: `(("URL" ,(attribute-string "URL"))) (if (attribute-string "NAME") (literal (attribute-string "NAME")) (literal (attribute-string "URL"))))) (element IMG (make element gi: "Inlinegraphic" attributes: `(("Fileref" ,(attribute-string "SRC")) (copy-attributes)))) (element A (if (attribute-string "HREF") (make element gi: "Ulink" attributes: `(("URL" ,(attribute-string "HREF"))(copy-attributes))) (make element gi: "Anchor" attributes: `(("ID" ,(attribute-string "NAME")))))) (element label (make empty-element gi: "Anchor" attributes: (copy-attributes))) (element ol (make element gi: "OrderedList")) (element em (make element gi: "Emphasis")) (element bf (make element gi: "Literal" attributes: `(("remap" "bf")))) (element pre (make element gi: "ProgramListing")) (element quotep (process-children)) (element dl (make element gi: "GlossList" (process-matching-children "DT"))) (define (get-sibs) (let loop ( (rest (follow (current-node))) (accum (empty-sosofo))) (let ( (tag (gi (node-list-first rest)))) (if (or (not tag) (string=? tag "DT")) accum (loop (node-list-rest rest) (sosofo-append accum (process-node-list (node-list-first rest)))))))) (element DT (make element gi: "GlossEntry" (make element gi: "GlossTerm") (make element gi: "GlossDef" (get-sibs)))) (element BR (make element gi: "Emphasis")) -- OS/2: Why marketing matters more than technology... cbbrowne@xxxxxxxx <http://www.ntlug.org/~cbbrowne/sgml.html> </quote> DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: HTML->DocBook? (Re: HTML -> RTF, Gary Lawrence Murphy | Thread | Re: HTML->DocBook? (Re: HTML -> RTF, Gary Lawrence Murphy |
Re: DSSSL documentation chap 11.3, Brandon Ibach | Date | Re: HTML->DocBook? (Re: HTML -> RTF, Gary Lawrence Murphy |
Month |