PHP Strip_tags

9 posts by 3 authors in: Forums > CMS Builder
Last Post: February 24, 2013   (RSS)

By design9 - January 22, 2013

I am using the following max words coding:

<?PHP function maxWords($textOrHtml, $maxWords) { 
$text = ($textOrHtml);
$words = preg_split("/\s+/", $text, $maxWords+1);
if (count($words) > $maxWords) { unset($words[$maxWords]); } 
$output = join(' ', $words);

return $output; 

?>

<?PHP echo maxWords($record['content'], 150);?>

I know how to use the strip_tags to strip the text/html. However, in this instance I only want to strip any <img> tags in the content and not all the html like line breaks, paragraphs, links, etc.

Is there a way to do that? Also, when using the max words, it is also pulling in strange characters which are not converting to html. Is there also a way to fix those - use "htmlspecialchars()"  ? 

Thanks!

April

By gregThomas - January 23, 2013

Hi,

You can use this preg_replace function to only remove images:

<?php echo preg_replace("/<img[^>]+\>/i", "", maxWords($record['content'])); ?>

Could you give me an example of the strange charecters that are being pulled in?

Thanks!

Greg

Greg Thomas







PHP Programmer - interactivetools.com

By design9 - January 24, 2013

Here is a screenshot of the strange character. It looks like a "?". 

Thanks!

April

Attachments:

screenshot-characters.jpg 486K

By gregThomas - January 24, 2013

Hi April,

The question mark blocks usually appear when a browser is unsure how to display characters because it doesn't recognize the encoding. CMS Builder stores its data using the UTF-8 character set. Could you check the header of your page to see if the page is displaying in UTF-8? Your header should look something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Let me know if you have any questions.

Thanks!

Greg

Greg Thomas







PHP Programmer - interactivetools.com

By design9 - February 1, 2013

Yes, I am using UTF-8. However, I still get those characters. Do you know any other reason that would cause it or how to strip those characters?

Thanks!

April

By gregThomas - February 1, 2013

Hi April,

Have you copied the text into the WYSIWYG editor from another source, for example from a word document? As sometimes this can import characters that are in the wrong format. Also, have you tried editing a record and deleting the character that isn't displayed correctly, and then adding it back in again? 

Could you provide a link to the page this is happening on?

Thanks!

Greg

Greg Thomas







PHP Programmer - interactivetools.com

By gregThomas - February 4, 2013

Hi April,

Just had another look at the page and noticed the question mark symbols have gone. Did you manage to resolve the problem? Or did you manually change all of the records? 

Thanks!

Greg

Greg Thomas







PHP Programmer - interactivetools.com

By Codee - February 24, 2013

I had a client that had several editorialists committing the same crime (er, uh "having the same problem") with pasting from Word and from Wordperfect. It would also occur with some non-apostrophe characters from time-to-time, too...such as a copyright symbol, maybe from an unusual font. A fix that worked very well for them,and especially the boss client who didn't want to fix all his editorialists work every week) was if they were going to paste copied text into the system then they added a step - first paste it into notepad or another non-word-special function based text editor (like ultra edit).. What happens in these basic text editors is that all special formatting is stripped out so that only true text characters are left. Then copy it from the notepad file into the the CMSB text boxes or WYWIWYG boxes.