Using PHP to HTML-ize Plain Text

I used to write my web site purely as HTML. Whenever I needed to post something as plain text (for instance, a resume on a job site), I would just cut and paste the text from my web browser into the plain text field. But that way, the links get lost. Enough people asked me for my links, that I started doing things the other direction: write the plain text, then convert it to HTML. I'm using the PHP scripting language for this.

I based my converter on functions from snippets.dzone.com. Their main function is, of course, "txt2html". I tweaked that heavily.

The main problem inside "txt2html" was to convert my links to HTML. I wanted to treat different file types differently:

To recognize HTML links, I used regular expression pattern-matching. Getting the right pattern was the challenge. www.ietf.org gave me the official spec, but I had to tune it for PHP. I wound up with:

/* Links in sophisticated manner which won't break for, say,
 * http://www,eilertech.com/stories/powernaut/1941.htm#1. 
 * Otherwise, why bother.  Pattern rules from
 * http://www.ietf.org/rfc/rfc3986.txt. Straight from the spec:
	  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
	   12            3  4          5       6  7        8 9 */
/* I found original patterns 2 and 4 find nothing, and 5 and 7 expand 
 * until they find question marks.  So I tweaked them. */
//$originalpattern =
//  "|(([^:/\?#]+):)(//([^/\?#]*))([^?#]*)(\?([^#]*))?(#(.*))?|i";

/* General rules for replacing images */ 
$imgReplacement = 
	"<" . "a href=../..$5$6$7$8> <" 
	. "img align=right width=180 src=../..$5$6$7$8 alt= 'Image for $filename'>";

/* Rules per supported file type */ 
$extArray = array (
	".htm" => "<" . "a href=../..$5$6$7$8>$4$5$6$7$8",
	".php" => "<" . "a href=../..$5$6$7$8>$4$5$6$7$8",
	".txt" => "<" . "a href=../..$5" . "page.php?fn=$6$7$8&tl=Link>$4$5$6$7$8",
	".jpg" => $imgReplacement,
	".gif" => $imgReplacement,
	".aspx" => "<" . "a href=http://$4$5$6$7$8>$4$5$6$7$8",
	"" => "<" . "a href=http://$4$5$6$7$8>$4$5$6$7$8");
/* $1 = http:
 * $2 = http
 * $3 = //www.eilertech.com
 * $4 = www.eilertech.com
 * $5 = /stories/powernaut/ 
 * $6 = 1941
 * $7 = .htm
 * $8 = #1
 * $9 = 1
 * Excluded:  ?fn=britannia_beach.txt */ 
 
// For each supported file type, up to and including Blank 
foreach ($extArray as $ext => $replacement) {

  // Define the search pattern here 
  $pattern = 
  "|((http):)(//([^/?# ]*))([^?# ,\.\)]*/)([^\.]*)?(" . $ext
  //12       3  4          5               6        7  
  . "[^# ,\)]*)(#([^ ,\.]*))?|i";
  //           8 9  
  
  /* We have the pattern, the replacement, and the HTML being built;
   * do the replacement. */ 
  $html = preg_replace ($pattern, $replacement, $html);
}
About the Author: 

Scott Eiler has for decades worked in all aspects of software engineering, in public and private sectors, in many different industries, on projects most people know by name, as employee, vendor, and now consultant. He also maintains his own diverse web site, including much commentary. Scott knows, engineering is more than just hacking out code.

Posted in: 
Development
Bookmark and Share

Comments

Comments

Reply to comment | The MATRIX Wall

Does your site have a contact page? I'm having problems locating it but, I'd
like to shoot you an e-mail. I've got some ideas for your blog you might be interested in hearing. Either way, great website and I look forward to seeing it grow over time.