eceono3.htm: fravia's counter intelligence: The Economics of Piracy -3

The Economics of Piracy - 3

An interesting email-exchange between
Brandon Van Every and Russ Williams
(september 1998)

Courtesy of fravia's pages of reverse engineering

Welcome to the third part of a very interesting email-exchange between some 'real' games' programmers. Even if the main concern of these guys is to avoid CD-ROM pirating, some of the tricks they are proposing and evaluating have quite a relevance for all reverse engineering enthusiasts, as you'll read. I have added very few comments.

Bra1Rus | Rus2Bra | Rus3Bra | Rus4Bra

Russ Williams wrote in message <906206116.15699.0.nnrp-
>
>Well, if you do the security on every single disc, then the QA
>would be done on the protected game - if it goes wrong,
>the testers will bitch about it...


But the lead programmer could screw up by generating a non-unique ID that
doesn't crash the game.  Like, he grabbed the wrong triangle because the
engineers, not knowing any better, moved something.  You'd need QA that
actually knows what the identification process is supposed to be.  Either
that or an infallible lead programmer.  :-)

>OK. Fair point. Something like using the least significant
>bits of each byte in a BMP body would be bad because it
>could be wiped out trivially once detected.

Rather, "could be wiped out trivially."  The only thing that needs detecting
is that BMP data exists.  Then you wipe the BMP data, you don't need to know
if it's an ID source or not.  You just need to add enough noise to the data
that whatever ID info it might have been carrying is completely scrambled.
Although past a certain point of perturbation, the perturbed BMP data could
become meaningless.  Ruin all the artwork, maybe this isn't advantageous to
the cracker after all.  Then again, maybe people would be happy with games
in new color schemes.


>>The *statistical* part comes from how many disks you
>>release to the world. The odds of the extremely determined
>>cracker getting ahold of 1, or 2, of them.
>
>Yup. The way I'd counter that is to provide codes grouped
>on 3 sources. That way you need nC3 keys, but a disc from
><=3 sources will have a piece of the data that's identical
>in all 3 versions and identifies which 3 leaked. If only
>2 leak, then there will be many more keys to identify
>which 2. Obviously, this is trivially expandable to any
>number in a group (in case you think 42 discs will go
>walkabout).


It's late... can you explain this one from the top?  You want to somehow
scramble data from 16 different disks into each other?  One thing I'm really
missing is what exactly you're scrambling.  Different source pools?  No that
can't be it... it's late.

>It could work for levels if you use lots and lots of strong
>encryption and chain levels together (ie: the key for level
>n is in the data for level n-1) and alter things after some
>number of levels. They'd crack the first dozen levels, say,
>figure it's working, but have missed that the encryption
>method changes for level 13..


The decryption method is always contained in the binary somewhere.  So
you're just presuming about the carefulness/carelessness of the cracker.
There are a million needle-in-a-haystack strategies out there, all can be
defeated with sufficient patience.  Just a matter of how much you enjoyed
writing it, vs. how much the cracker enjoys cracking it.

>>No, you don't want it in the raw data.  The raw data is easy
>>to perturb randomly and still get basically the same data.
>
>That depends how you place it. Ultra low frequency
>components across the whole dataset would be much
>more difficult to remove. Imagine a sample with a 4Hz
>sine wave mixed in - it would be undetectable by ear
>and noise wouldn't remove it.

So you run it under an audio filter and chop anything the ear can't hear.
Then you do some nasty things to what you can hear, so that the data is now
sufficiently different.

>>You want to stick your key in the INDEX STRUCTURE
>>of the data.
>
>Too obvious.


Again, the goal is not "find the key!"  The goal is to erase the key.  An
index structure is selected because it'll take the most time to erase, you
have to figure out how the damn thing works before you can do it.

>>Somewhere that takes quite a while to figure out how
>>to transform without breaking everything.
>
>But, as you've said, there are crackers willing to spend
>any amount of time doing the grunt work..


Well, what kind of index file would be so onerous that only the most
foolhardy cracker would attempt it?

>>To reiterate, you don't FIND the key.  You eradicate
>>everything, including the key.
>
>It's not always that simple. Unless the crackers are going
>to rip out every image, every sample, every piece of
>data from the game then they're not going to be 100%
>effective.


They are going to do exactly that.  They are going to develop automated
methods for ripping everything to shreds.  Hence why encoding in BMP files
doesn't work, the structure of the file is well-known and easily
transformed.  The index methods are not well-known,   	  		  ;True! Yet the 
they're unique to each app.  So the goal is to think up a really onerous, ;debugging overload
convoluted one. Something so horrid you'd need months of test-debug 	  ;is something that
iteration to figure it all out.					  	  ;programmers that
									  ;are not crackers as
>[...]								     ;well will not endure!
>>Incidentally, if you're willing to burn your CDs one at a time,
>>then you could use the same data transformation methods
>>to encode the unique identity of a file.  Rather than sticking
>>a unique ID on each disk somewhere within the file, and
>>running the risk of 2 files being compared, you make the data
>>file on *every* CD unique.
>
>That was the idea of #2 above - a compressed and
>encrypted data set can't be compared meaningfully if
>they encryption key changes between builds. You need
>to spend ages decompressing before you can
>compare.


No, your techinque is different.  I'm not encrypting anything on the CDs,
I'm just guaranteeing that the entire data file has a unique bit pattern for
each CD.  The entire file becomes the ID.  I can read my unique file without
a decryption mechanism.  The problem with your decryption, is that once the
file is decrypted it's the same regardless of what CD it came from.  And the
file *can* be decrypted, the code is always available to do this in the
binary itself.


Cheers,                    3d graphics optimization jock
Brandon Van Every          Seattle, WA
-----------------------------------------------------------------------
If we are all Gods         and we have thrown our toys the mortals away
and we are Immortal        What shall we do
and we cannot die          to entertain ourselves?

Brandon Van Every wrote:
>Russ Williams wrote:
[...]
>>>The *statistical* part comes from how many disks you
>>>release to the world. The odds of the extremely determined
>>>cracker getting ahold of 1, or 2, of them.
>>
>>Yup. The way I'd counter that is to provide codes grouped
>>on 3 sources. That way you need nC3 keys, but a disc from
>><=3 sources will have a piece of the data that's identical
>>in all 3 versions and identifies which 3 leaked. If only
>>2 leak, then there will be many more keys to identify
>>which 2. Obviously, this is trivially expandable to any
>>number in a group (in case you think 42 discs will go
>>walkabout).
>
>It's late... can you explain this one from the top?  You want
>to somehow scramble data from 16 different disks into
>each other?  One thing I'm really missing is what exactly
>you're scrambling.  Different source pools?  No that
>can't be it... it's late.

OK. You're sending discs to 4 people: A, B, C and D.
You want to make sure that even if 3 of them leak, you
can ID them.
You hide 4 keys: ABC, ABD, BCD, ACD.
ie: the first key is in A's copy, B's copy and C's copy
but not in D's.

If the cracker gets A's, B's and C's copies and checks
them for differences, they'll detect 3 of the keys:
ABD won't be in C's, BCD won't be in A's and ACD
won't be in B's. But ABC will be the same in all 3
versions. You know that there are 4 codes and where
they are, but the cracker doesn't. If they eliminate
all the differences, key ABC remains and identifies
the 3 culprits.

If only B and C leak, then keys ABC and BCD will
remain.

>>It could work for levels if you use lots and lots of strong
>>encryption and chain levels together (ie: the key for level
>>n is in the data for level n-1) and alter things after some
>>number of levels. They'd crack the first dozen levels, say,
>>figure it's working, but have missed that the encryption
>>method changes for level 13..
>
>The decryption method is always contained in the binary
>somewhere.  So you're just presuming about the
>carefulness/carelessness of the cracker.

Yup.

>There are a million needle-in-a-haystack strategies out
>there, all can be defeated with sufficient patience.

Yup. But who cares? If it takes them 2 months to crack
the game that's 2 months of sales and then the game
is 'old hat'.

>Just a matter of how much you enjoyed writing it, vs. how
>much the cracker enjoys cracking it.

Yup.

>>>No, you don't want it in the raw data.  The raw data is easy
>>>to perturb randomly and still get basically the same data.
>>
>>That depends how you place it. Ultra low frequency
>>components across the whole dataset would be much
>>more difficult to remove. Imagine a sample with a 4Hz
>>sine wave mixed in - it would be undetectable by ear
>>and noise wouldn't remove it.
>
>So you run it under an audio filter and chop anything the ear
>can't hear. Then you do some nasty things to what you can
>hear, so that the data is now sufficiently different.

And who would go to all that trouble?
[...]
>>>Somewhere that takes quite a while to figure out how
>>>to transform without breaking everything.
>>
>>But, as you've said, there are crackers willing to spend
>>any amount of time doing the grunt work..
>
>Well, what kind of index file would be so onerous that only
>the most foolhardy cracker would attempt it?

I have no idea. All indices seem fairly simple formats to
me.

---
Russ

>>>>No, you don't want it in the raw data.  The raw data is easy
>>>>to perturb randomly and still get basically the same data.
>>>
>>>That depends how you place it. Ultra low frequency
>>>components across the whole dataset would be much
>>>more difficult to remove. Imagine a sample with a 4Hz
>>>sine wave mixed in - it would be undetectable by ear
>>>and noise wouldn't remove it.
>>
>>So you run it under an audio filter and chop
>>anything the ear can't hear. Then you do some nasty
>>things to what you can hear, so that the data is now
>>sufficiently different.
>
>And who would go to all that trouble?

One other point that I missed before: what about
low-frequency components in *images*? There is no
'outside the seeing range' in this case.

And as for "sufficiently different", these methods are
so subtle that to be sure it's code free the data would
need to be replaced by white/pink noise - hardly worth
the effort of cracking if that's what you end up with.
Most crackers would simply rely on anonymity at
both ends and let the leaks fend for themselves (or
distribute the final version from shops).

---
Russ

Brandon Van Every <vanevery@blarg.net> wrote:>
>Russ Williams wrote:
>>One other point that I missed before: what about
>>low-frequency components in *images*? There is
>>no 'outside the seeing range' in this case.
>
>Sure there is.  Darkest part of the image, you don't
>need it.  People scale the luminescence of photos
>all the time.

But this won't get rid of it. Similar watermarking
techniques are capable of detecting a copy of an
image that's been altered (blur/soften/sharpen/etc.),
printed into a newspaper and scanned back into
a computer. Simple bit flipping or scaling just won't
cut it.

---
Russ