• starweasel [it/its, comrade/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      9 hours ago

      scripts that NVIDIA distributed to clients so they could automatically download and preprocess The Pile dataset.

      sounds like they allegedly wrote some stuff to get faster downloads/avoid throttling while they were allegedly pirating books from shadow libraries for their AI

    • Chahk@beehaw.org
      link
      fedilink
      English
      arrow-up
      7
      ·
      9 hours ago

      In addition, the motion also targets the contributory copyright infringement allegations, which center on scripts and tools NVIDIA allegedly distributed so corporate customers could automatically download ‘The Pile,’ the dataset that contains Books3.