Re: [Hampshire] help with algorithm

Top Page

Reply to this message
Author: Chris Dennis
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] help with algorithm
On 24/07/10 09:21, James Courtier-Dutton wrote:
> Hi,
>
> I am writing a open source tool that decompiles binary code.
> I have it working quite well, taking the binary code and outputting a
> .c program.
> I call it "libbeauty" and it is hosted here: http://revenge.berlios.de/
>
> I am having a problem crafting an algorithm.
> Here is some example simple code:
> int m;
> int n;
>
> n = 0;
> loop:
> m = data[n];
> n++;
> If n<= 9 goto loop;
>
> Now, I can easily tell by looking at the code that the size of the
> "data" array is 10. n counts from 0 to 9.
> So, by observation, one can determine that we should have at the top
> of the code: int data[10];
>
> I am trying to come up with an algorithm that will discover the size
> of the data array.
> I have come up with several methods, but none of them are pretty.
> E.g.
> Using constraints.
> "n=0" means that at the point of "loop:" n has initialised to 0.
> One can determine that between "loop:" and the if statement, n is
> increasing and never decreasing. (but there are exceptions to
> consider, e.g. wrap around)
> From the if statement one can determine that when reaching "loop:" n
> is always less or equal to 9.
> So, when one reaches the "m=data[n]" one has the following constraints.
> 1) n initialised to 0;
> 2) n always increases
> 3) n is less than 10.
> We therefore have both bounds, the lower bound and the upper bound,
> and can therefore say "int data[10]"
>
> The above constraints method works very well for this example, but it
> will gets less productive if the code is more complex than a simple
> loop.
>
> Can anyone come up with a better algorithm?


No. What would the algorithm do in this simple example?:

read x,y;
data[x] := y;

There's no general way of knowing what the bounds of the array should
be. (Unless there is bounds-checking code, which there may not be.)

Could your decompiler analyse the data area of the program, and work
out, for example, that the data[] array starts at address A, and no
other variables point to memory between A and B. The size of the array
is then B-A.

Or have I missed the point?

cheers

Chris

-- 
Chris Dennis                                  cgdennis@???
Fordingbridge, Hampshire, UK