In the days after the US Department of Justice (DOJ) published 3.5 million pages of documents related to the late sex offender Jeffrey Epstein, multiple users on X have asked Grok to “unblur” or remove the black boxes covering the faces of children and women in images that were meant to protect their privacy.

  • AnarchistArtificer@slrpnk.net
    link
    fedilink
    English
    arrow-up
    59
    ·
    27 天前

    The datasets they are trained on do in fact include CSAM. These datasets are so huge that it easily slips through the cracks. It’s usually removed whenever it’s found, but I don’t know how this actually affects the AI models that have already been trained on that data — to my knowledge, it’s not possible to selectively “untrain” models, and they would need to be retrained from scratch. Plus I occasionally see it crop up in the news about how new CSAM keeps being found in the training data.

    It’s one of the many, many problems with generative AI