Remove String in Quotes Around Surrounding String

Let’s say I have images with alternate text that serves no purpose other than to add visuals to my website. Here’s an example:

<img src="..." alt="my first picture">
<img src="..." alt="my second picture">

I’ve been trying to remove whatever text that is in alternate text. This is the command that I need to execute that removes whatever alternate text that is considered decorative:

find modl/posts/ -name "*.html" -exec sed 's/alt="[^ \/]*\( .*\)">/alt="">/g' {} \;

With that in mind, I want it to be something like:

<img src="..." alt="">
<img src="..." alt="">

How would i accomplish this with one command without having to go through each HTML file to remove what is in alt=“” attribute?

PS: I ask that you please do not read outside the context of my post. The Web Content Accessibility Guidelines cover Success Criterion 1.1.1 and that standard covers different types of images. I need to remove alternate text inside quotes that should only serve the purpose (“why?”) and not tell what the content is about (“what?”).

Thank you.

import bs4
from bs4 import BeautifulSoup
import requests
import os

myDir = "PATH_TO_THE_FOLDER_WITH_THE_HTML-FILES"
for f in os.listdir(myDir):
    if f.endswith(".html"):
    soup = BeautifulSoup(f, "html.parser")  
    alt = soup.find("alt")
    alt.string = ""

Save the content to file like replace_alt.py

chmod +x replace_alt.py

install if needed:
pip3 install bs4
pip3 install request

run de file: python3 replace_alt.py

1 Like

Thank you very much, but I’m good. I have not been here in a while.