[Apache TVM Discuss] [Questions] Removing intermediate tensors/replacing to temporary variable

I am generating C code using tensor expression. Currently, a lot of intermediate tensors are generated along the way, and I want to know if I can use e.g., scalar temp variables instead.

Specifically, I am calculating Conv, then batch norm, then ReLU. For performance, I would want to essentially inline BN and Relu to the Conv loop (or Conv loop and BN loop to the Relu, or whatever to only have single loop).
However, I cannot inline Conv loop into other loops because Conv loop contains reduce and TVM spits and error saying reductions are only allowed at the top of the loop.
Thus, I currently inline the BN loop and use compute_at() to merge the Relu loop and conv loop.
For example, I am doing something like this in my tensor expression code:

    A = te.placeholer(...)
    W = te.placeholder(...)
    Conv = te.compute( ... my conv code from A, W)
    BN = te.compute( .. BatchNorm using Conv ...)
    Relu = te.compute ( .. Relu over BN)
    s = te.create_schedule([Relu.op])
    s[B].compute_at(s[Relu], Relu.op.axis[3])
The generated C code looks something like this:

        Conv[...] = 0;
        for (...)
          for (...)
            for (...)
              Conv[...] += A[...] * W[...]  // Conv loop
        Relu[...] = ReLU(BN(Conv[...])) 

However, I only need Relu[...] array results, so Conv[...] is better replaced by something like a scalar variable. I.e, I want this code to be:

        tmp = 0;
        for (...)
          for (...)
            for (...)
              tmp += A[...] * W[...]  // Conv loop
        Relu[...] = ReLU(BN(tmp))

Is there a simple way to do this with te?
The explanation seems a bit messy, so hope this makes sense.

Thank you!

