• 0 Posts
  • 43 Comments
Joined 1 year ago
cake
Cake day: June 30th, 2023

help-circle



  • Ok. I mean I have no idea how government agencies organise this. If these are exceptional circumstances where a system needs exposing to “every csam image ever” then I would reasonably assume that justifies the one off cost of making the circumstances exceptionally secure. It’s not like they’re doing that every day.

    You raise a separate important point. How is this technology actually used in practice? It sounds like photoDNA, being based on hashes, is a strictly one way thing, information is destroyed in the production of the hashing model. And the result can only be used to score a likelihood that a new image is csam or not. But AI is not like that, the input images are used to train a model and while the original images don’t exist in there, it distills the ‘essence’ of what those photos are down into their encoded essence. And as such an AI model can be used both for detection and generation.

    All this to say, perhaps there are ways for photoDNA to be embedded in systems safely, so that suspect csam doesn’t have to be transmitted elsewhere. But I don’t think an AI model of that type is safe to deploy anywhere. It feels like it would be too easy for the unscrupulous to engineer the AI to generate csam instead of detect it. So I would guess the AI solution would mean hosting the model in one secure place and suspect images having to be sent to it. But is that really scalable. It’s a huge amount of suspect images from all sorts of messaging platforms we’re talking about.


  • Wow I had no idea

    It sounds like a much needed improvement then!

    Any idea if Photo DNA needs training sets to the same extent AI does? It still feels like training currently LLM models, at least how I think they work, requires vast amounts of “examples”

    It still feels like that amounts to putting huge amounts of task csam just “out there” with tech companies. If it saves a bunch of human moderators the toil of having to review quite so much then that’s definitely a great help. But can you say anything about the comparative scale of the content involved? My impression is that previous versions of something like photoDNA would need a set of something for testing purposes. But the quantity needed to train AI is going to be vastly bigger (and therefore it’s possible leak vastly worse?)


  • That’s kinda of what I was alluding to. If they have zero op sec, they’re almost certainly sharing known csam too, and that’s the kind of stuff where just the hashes can be used to catch them. But the hashes can be safely shared with any messaging service or even OS developer, because the hashes aren’t csam themselves.

    What I was calling “risky” about the above is it sounds like the first time law enforcement are sharing actual csam with a technology company so that that company can train an AI model on it.

    Law enforcement have very well developed processes and safe guards about who can access csam and why and it’s thoroughly logged and scrutinised and supported with therapy and so on .

    Call me skeptical that these data companies that are putting in tenders to receive csam and develop models are going to have anywhere near the suitable level of safeguard and checks. Lowest bidder and all that.

    So it all seems like a risky endeavour, and really it’s only going to catch - as you say - your zero op sec paedo, but those people were going to get caught anyway, sharing regular csam detected with hashes.

    So it seems like it has a really narrow target. And undertaken with significant risk. Seems like someone just wants to show “they’re doing something”. Or some data company made a reeeally glossy brochure…