Php remove tags and content in between and limit to 300 characters

I am trying to remove specific html tags such as <h3>content</h3> and <br> &nbsp; and whitespace from text that is stored in a database and also limit the text to 300 characters then followed by a read more link that scrolls to the full length description text bit further down the page.

I have been putting code together that I got from various forum posts/tutorials from the internet and it’s not pretty but it sort of works, for example it works on this product page:-

https://www.it-doneright.co.uk/shop/laptops-tablets/Laptops/lenovo-100e-chromebook-g2-laptop-11-6-celeron-n4020-4gb-32gb-emmc-webcam-wi-fi-no-lan-usb-c-chrome-os but on this product page https://www.it-doneright.co.uk/shop/components/Graphics-Cards/asrock-intel-arc-a310-lp-4g-pcie4-4gb-ddr6-hdmi-dp-2000mhz-clock-0db-cooling-low-profile

it leaves ul> but the ul> should not be displayed so the list remains as it is and the ul> should be <ul> but only seen in the source code and not on the page for people to see

Below is the code I currently have, sorry I know it’s not pretty

<h2 class="productsummaryheading mb-0">Product Summary</h2>
<?php
$html= substr($description,0,strrpos(substr($description,0,300)," "));
$final = preg_replace('#<h3>(.*?)</h3>#', '', $html, 1);
$final = preg_replace('~\x{00a0}~siu',' ',$final);
$final = trim(preg_replace('/\s+/', ' ', $final));
$final = strip_tags($final, ['ul', '&gt;ul&gt;', 'li']);
echo trim($final, ' <br> <br>');
?> ...<a href="<?php echo $_SERVER["REQUEST_URI"]; ?>#productdescription">Read More</a>

strip_tags expects a string as second parameter:

$final = strip_tags($final, '<ul><li>');

Thank you, I replaced the following line

$final = strip_tags($final, ['ul', '&gt;ul&gt;', 'li']);

with your code

$final = strip_tags($final, '<ul><li>');

It’s still outputting the ul> on the product page for the product https://www.it-doneright.co.uk/shop/components/Graphics-Cards/asrock-intel-arc-a310-lp-4g-pcie4-4gb-ddr6-hdmi-dp-2000mhz-clock-0db-cooling-low-profile

The second parameter in strip_tags is the things you DONT want the command to eat. It will eat all other tags. Also it’s allowed to have an array as a parameter (7.4+), or null as a parameter (8+).

It also wont eat malformed tags, so the ul> is missing its <, and it isnt a tag anymore.

I found the line causing the ul> issue and removed that line so now have the following code

<?php
$html= substr($description,0,strrpos(substr($description,0,300)," "));
$final = preg_replace('#<h3>(.*?)</h3>#', '', $html, 1);
$final = strip_tags($final, '<ul><li>');
echo trim($final, ' <br> <br>');
?>

On the product page https://www.it-doneright.co.uk/shop/laptops-tablets/Laptops/lenovo-100e-chromebook-g2-laptop-11-6-celeron-n4020-4gb-32gb-emmc-webcam-wi-fi-no-lan-usb-c-chrome-os is &nbsp; before the start of the text so need to remove the whitespaces and &nbsp; and on the product page https://www.it-doneright.co.uk/shop/components/Graphics-Cards/asrock-intel-arc-a310-lp-4g-pcie4-4gb-ddr6-hdmi-dp-2000mhz-clock-0db-cooling-low-profile is &nbsp; &nbsp; before the list so the remove the &nbsp; &nbsp; as well

How can I do them two things please so there are no whitespaces and non breaking spaces before the start of the text and the start of the list

Replace &nbsp; with a space, then trim() and then replace multiple spaces with a single space.

Sorry what would the code be for that as it goes past my php knowledge of what the code would be for that or is there a example link please that I could try and understand the code and get a example

Hey ian114,

For keeping <ul> tags but stripping the rest, you can try strip_tags($final, '<ul><li>') instead of your current approach. This allows those specific tags while removing others.

I would do it like this:

$description = strip_tags($description, '<ul><li>'); // strip all HTML except <ul> and <li>
$description = str_replace('&nbsp;', ' ', $description); // replace &nbsp; with actual spaces
$description = preg_replace('~\s+~s', ' ', $description); // replace multiple spaces with single space
$description = trim($description); // strip any spaces at start and end of string
if (strlen($description) > 300) {
    $description = substr($description, 0, strrpos($description, ' ', -1 * (strlen($description) - 300))); // limit to 300 characters
}

Though the results of script like these are almost always sub-optimal. It’s better that a human enters a summery for each product, if that’s feasible of course.

I tried that code but it’s not outputting any description text on the product page

Right, you need to add echo $description; at the end.

Sorry I realised the code itself was correct minus the echo of the description text so added in echo $description and it’s outputted the text but it’s not removed the text content that is inside the h3 tags, it’s only removed the h3 tags but also it’s made the changes to the full length description that is further down the page but nothing was displayed under the product summary heading, see product page https://www.it-doneright.co.uk/shop/laptops-tablets/Laptops/lenovo-100e-chromebook-g2-laptop-11-6-celeron-n4020-4gb-32gb-emmc-webcam-wi-fi-no-lan-usb-c-chrome-os

Right, so the longer description was in $description, and we replaced it…

Okay, try this:

$productSummary = preg_replace('~<h3>.*?</h3>', '', $description); // strip h3 + contents
$productSummary = strip_tags($productSummary, '<ul><li>'); // strip all HTML except <ul> and <li>
$productSummary = str_replace('&nbsp;', ' ', $productSummary); // replace &nbsp; with actual spaces
$productSummary = preg_replace('~\s+~s', ' ', $productSummary); // replace multiple spaces with single space
$productSummary = trim($productSummary); // strip any spaces at start and end of string
if (strlen($productSummary) > 300) {
    $productSummary = substr($productSummary, 0, strrpos($productSummary, ' ', -1 * (strlen($productSummary) - 300))); // limit to 300 characters
}
echo $productSummary;

Thank you for the updated code. I tried it but it’s not outputting any text under the product summary heading

Obligatory thought: Why are we trying to JIT fix data for display, rather than… fixing it in the database?

The text content and the html tags are imported from a suppliers product feed from their FTP server and stored in the database as it is from the product feed so unsure if it would work by fixing it in the database as the feed is checked every hour or 2 and updated and new products added often from the product feed

Try this:

$productSummary = preg_replace('~<h3>.*?</h3>~', '', $description); // strip h3 + contents
$productSummary = strip_tags($productSummary, '<ul><li>'); // strip all HTML except <ul> and <li>
$productSummary = str_replace('&nbsp;', ' ', $productSummary); // replace &nbsp; with actual spaces
$productSummary = preg_replace('~\s+~', ' ', $productSummary); // replace multiple spaces with single space
$productSummary = trim($productSummary); // strip any spaces at start and end of string
if (strlen($productSummary) > 300) {
    $productSummary = substr($productSummary, 0, strrpos($productSummary, ' ', -1 * (strlen($productSummary) - 300))); // limit to 300 characters
}
echo $productSummary;

“and stored in the database”.

Stored in the database by what? Why cant’ THAT process logic to strip out the necessary tags, and leave you with a simple echo when you need it?

You’ll save your server a lot of processing power (if not database space) by filtering the data on the one-time way in, rather than on the multiple-time way out.

That’s worked, just a couple of little issues to work out if possible. On the product url https://www.it-doneright.co.uk/shop/laptops-tablets/Laptops/lenovo-100e-chromebook-g2-laptop-11-6-celeron-n4020-4gb-32gb-emmc-webcam-wi-fi-no-lan-usb-c-chrome-os there is &nbsp; before the text and on the product url https://www.it-doneright.co.uk/shop/components/Graphics-Cards/asrock-intel-arc-a310-lp-4g-pcie4-4gb-ddr6-hdmi-dp-2000mhz-clock-0db-cooling-low-profile there is &nbsp; &nbsp; before the list starts and the reviews are being added into the last <li> tag but the <form> tag should be after the </li></ul> tags but looks like they are being added on after the closing </form> tag

The import extension I am using imports the products data from the suppliers product feed and stored in the product database table. I don’t think the import extension I am using in opencart can strip out the necessary tags and it may then change the look of the full description length that is further down the page