Re: [Hampshire] help with algorithm

Top Page

Reply to this message
Author: James Courtier-Dutton
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] help with algorithm
On 24 July 2010 23:09, Hugo Mills <hugo@???> wrote:
> On Sat, Jul 24, 2010 at 09:21:57AM +0100, James Courtier-Dutton wrote:
>> I am writing a open source tool that decompiles binary code.
>> I have it working quite well, taking the binary code and outputting  a
>> .c program.
>> I call it "libbeauty" and it is hosted here: http://revenge.berlios.de/
>>
>> I am trying to come up with an algorithm that will discover the size
>> of the data array.
>
>   Don't bother, you're on a hiding to nothing with that one.

What is "hiding to nothing" ?

>
>   If you examine the code, you'll find that the memory for
> locally-scoped arrays array is typically allocated on the stack. IIRC,
> on x86 (and certainly on ARM, which is where my experience in
> decompiling things comes from), the standard stack grows *downwards*.
> This means that if you keep a map of the contents of the stack
> below(*) the current stack frame, you can look at the base address of
> the array, and take the difference between that and the next thing
> up(**) on the stack, divide by sizeof(array_type), and Robert's your
> uncle's live-in lover, as they say.
>
> (*) below in terms of memory address. Actually "above" in terms of stack growth.
> (**) up by memory, down by stack growth...
>
>   Keeping said map should be relatively easy, because every variable
> in local scope will either be on the stack (and accessed at SP+i, for
> some i), or will only be allocated to a register (in which case you
> don't care about it, for these purposes).
>
>> Can anyone come up with a better algorithm?
>
>   HTH,
>   Hugo.
>


Thank you for the response.
There are several areas where data array sizes might behave differently.
1) Global data segment.
2) Local variables on the stack.
3) data on the heap (assigned via malloc() etc.)

I was specifically trying to look at (1).

You are correct, the size of the stack frame is a good area where one
might infer bounds on an array.
I already have fairly good handling of local variables in my code.
My code uses SSA (Single static assignment) that correctly handles the
local variable, be it on the stack or just in a register.
It also handles function parameters well.

If I don't find any other ideas, I will at least try using the
"contraints" model combined with the "data scope" method.
The "data scope" method being the setting of the entire ranges to data
to be data[1000] (if 1000 is the full size of the data segment)
The as the code references global variables within the uint32_t
data[1000]; it can then be refined but splitting it up into uint32_t
data1[400]; uint32_t data2[600];

Kind Regards

James