Re: [Hampshire] help with algorithm

Top Page
Author: Hugo Mills
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] help with algorithm

Reply to this message
gpg: failed to create temporary file '/var/lib/lurker/.#lk0x57809100.hantslug.org.uk.19527': Permission denied
gpg: keyblock resource '/var/lib/lurker/pubring.gpg': Permission denied
gpg: Signature made Sun Jul 25 21:33:57 2010 BST
gpg: using DSA key 20ACB3BE515C238D
gpg: Can't check signature: No public key
On Sun, Jul 25, 2010 at 08:42:25PM +0200, James Courtier-Dutton wrote:
> On 24 July 2010 23:09, Hugo Mills <hugo@???> wrote:
> > On Sat, Jul 24, 2010 at 09:21:57AM +0100, James Courtier-Dutton wrote:
> >> I am writing a open source tool that decompiles binary code.
> >> I have it working quite well, taking the binary code and outputting  a
> >> .c program.
> >> I call it "libbeauty" and it is hosted here: http://revenge.berlios.de/
> >>
> >> I am trying to come up with an algorithm that will discover the size
> >> of the data array.
> >
> >   Don't bother, you're on a hiding to nothing with that one.
> What is "hiding to nothing" ?


A fool's errand; doomed to failure.

> >   If you examine the code, you'll find that the memory for
> > locally-scoped arrays array is typically allocated on the stack. IIRC,
> > on x86 (and certainly on ARM, which is where my experience in
> > decompiling things comes from), the standard stack grows *downwards*.
> > This means that if you keep a map of the contents of the stack
> > below(*) the current stack frame, you can look at the base address of
> > the array, and take the difference between that and the next thing
> > up(**) on the stack, divide by sizeof(array_type), and Robert's your
> > uncle's live-in lover, as they say.
> >
> > (*) below in terms of memory address. Actually "above" in terms of stack growth.
> > (**) up by memory, down by stack growth...
> >
> >   Keeping said map should be relatively easy, because every variable
> > in local scope will either be on the stack (and accessed at SP+i, for
> > some i), or will only be allocated to a register (in which case you
> > don't care about it, for these purposes).
> >
> >> Can anyone come up with a better algorithm?
> >
> >   HTH,
> >   Hugo.
> >
>
> Thank you for the response.
> There are several areas where data array sizes might behave differently.
> 1) Global data segment.
> 2) Local variables on the stack.
> 3) data on the heap (assigned via malloc() etc.)
>
> I was specifically trying to look at (1).
>
> You are correct, the size of the stack frame is a good area where one
> might infer bounds on an array.
> I already have fairly good handling of local variables in my code.
> My code uses SSA (Single static assignment) that correctly handles the
> local variable, be it on the stack or just in a register.
> It also handles function parameters well.
>
> If I don't find any other ideas, I will at least try using the
> "contraints" model combined with the "data scope" method.
> The "data scope" method being the setting of the entire ranges to data
> to be data[1000] (if 1000 is the full size of the data segment)
> The as the code references global variables within the uint32_t
> data[1000]; it can then be refined but splitting it up into uint32_t
> data1[400]; uint32_t data2[600];


That's probably the best you can do, in that case, then. Ugly as
all hell, but as long as you work with strong upper bounds, you
probably can't go much wrong.

Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
    --- But somewhere along the line, it seems / That pimp became ---    
                       cool,  and punk mainstream.