• kbin_space_program@kbin.run
    link
    fedilink
    arrow-up
    20
    arrow-down
    23
    ·
    edit-2
    5 months ago

    They state that they only turn on “when you say the special phrase.”

    But in order to do that, they have to be always listening and parsing what you say.

    And in order to pay for that processing time, its getting processed for any data they can sell ads on

    • WolfLink@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      42
      ·
      5 months ago

      That’s not necessarily true. Detection of the trigger phrase is simple enough it can be done locally. If they are sending all your audio to their servers it’s not because they need to be.

      • Toribor@corndog.social
        link
        fedilink
        English
        arrow-up
        42
        ·
        5 months ago

        It drives me crazy people insist they are sending a constant audio stream somewhere for nefarious purposes without any evidence. From a networking perspective this is knowable information.

      • kbin_space_program@kbin.run
        link
        fedilink
        arrow-up
        4
        arrow-down
        26
        ·
        5 months ago

        Publicly they’ve stated that it does that.

        However it wouldnt be the first time Apple, Amazon and particularly Google have lied.

        • cm0002@lemmy.world
          link
          fedilink
          English
          arrow-up
          20
          ·
          5 months ago

          It’s verifiable, you can observe the connections it makes.

          Admittedly, you can’t see the contents of the packets themselves, but you can tell easily anyways if it’s doing anything close to sending a constant stream of audio

          • Ekky@sopuli.xyz
            cake
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            10
            ·
            5 months ago

            Assuming that they parse everything locally, which appears to be the case, then why would it have to send a constant stream of audio? A small list/packet of keywords of a few bytes or KB once a day would suffice for most telemetry (including ad analysis and other possible spying reasons’) needs.

            Also, one ought to be able to see the contents of the packets if they retrieve the devices’ SSL key for the session, so this should also be falsifiable.

            • cm0002@lemmy.world
              link
              fedilink
              English
              arrow-up
              10
              ·
              5 months ago

              Most of the Google Home speakers do not have the processing capacity for true local processing.

              Local processing in the context of a smart home speaker is searching for a certain trigger keyword and nothing else, this doesn’t require much oomf locally.

              A system that you describe is totally possible, but not with the hardware you find in the average smart speaker, thus a constant stream of audio needs to be sent off to the cloud somewhere.

              Also, yea it’s not impossible to drop in on an SSL connection, but the embedded nature of the speakers makes it a bit more difficult.

              • Ekky@sopuli.xyz
                cake
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                4
                ·
                5 months ago

                Thank you for the explanation, though the underlying requirements for keeping a list locally appear to remain much the same, since you really only need to add a few trigger words to the “dumb, always-on” local parser (such as your top 1000 advertisers’ company or product names). After all, I’d imagine we do not require context, but only really need to know whether a word was said or not, not unlike listening for the “real” trigger word.

                This is of course only one of many ways to attack such a problem, and I do not know how they ultimately would do, assuming that they were interested in listening in on their users in the first place.

                And yes, embedded devices are slightly harder to fiddle with than using your own computer, but I’d bet that they didn’t actually take the time to make a proper gate array and instead just use some barebones Linux, which most likely means UART access!

        • Revan343@lemmy.ca
          link
          fedilink
          English
          arrow-up
          8
          ·
          5 months ago

          If they were constantly recording and sending that data home, it would have been noticed very quickly; all it takes is one nerd running wireshark

    • huginn@feddit.it
      link
      fedilink
      English
      arrow-up
      21
      ·
      5 months ago

      They process locally. You can watch their traffic: there’s very little going out besides their own diagnostics.

      So you pay for the processing with your own electricity

      • dual_sport_dork 🐧🗡️@lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        arrow-down
        1
        ·
        5 months ago

        So you pay for the processing with your own electricity

        Yes, that is how I would much rather my computers work and, in fact, how they have historically done so.

        • huginn@feddit.it
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 months ago

          Yeah but that’s in contrast to OP above saying that the companies have to pay for processing with ads.

    • JackbyDev@programming.dev
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      1
      ·
      edit-2
      5 months ago

      No dude, they don’t send shit to the cloud to process. It just stores like 5 seconds of voice locally and listens for the wake word. This is why you can only choose a few wake words and not pick anything arbitrary. I’m all for criticizing big tech, but don’t lie about how it works.

      Edit: Small correction, they of course send the the buffer and begin properly recording once it detects the wake word. Locally it can only detect Alexa and any other wake words it can respond to.

      • SpaceCowboy@lemmy.ca
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        5 months ago

        Yeah, right? It’s technology not magic. Anyone can monitor the traffic from a device on a network and if it were sending a significant amount of data when not activated, every third party security researcher would know within minutes. It would be well publicized by respected security research organizations if they were constantly sending voice data.

    • Pandantic [they/them]@midwest.social
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      edit-2
      5 months ago

      I use Alexa, but only on touch button. Still easy and convenient, less “always listening”.

      I know there will be a comment about how they’re already always listening, I choose to not believe that because i haven’t given up on the world yet. 😑

      Edit: though I must admit, I take precautions at times!

    • Gerudo@lemm.ee
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      5 months ago

      Publicly, they state it is a rolling 5-10 second analyzer, and nothing gets recorded until you say the word.

    • grue@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      5 months ago

      Allegedly, the processing to listen for the activation phrase is done locally.

      • Revan343@lemmy.ca
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        5 months ago

        Not just allegedly, verifiably. Simple enough to check with Wireshark

        • grue@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 months ago

          I don’t have one of those devices, and didn’t want to exclude the possibility that it was “chatty” enough with its server (checking for updates etc.) that a speech analysis request couldn’t be hidden within the noise.

          • Revan343@lemmy.ca
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 months ago

            Fair enough. I’ve never checked myself because I’m also not interested in having that sort of thing, but I’ve read a few blog articles by people who have