Input:
[batchSize, imageY, imageX, channels]
[imageY, imageX]
Kernel:
[kernelY, kernelX, channels, outputChannels]
[kernelY, kernelX]
Output:
[imageY2, imageX2]
[batchSize, imageY2, imageX2, outputChannels]
Ohne FPGA-seitigem Speicher
FPGA Recheneinheiten werden verteilt (batchSize * channels * outputChannels)
Mal verwendet.
for sample in range(batchSize):
for outputChannel in range(outputChannels):
for channel in range(channels):
output[sample][outputChannel] += f(
input[sample][channel],
kernel[channel][outputChannel]
)