SoC developers and system designers are looking at hardware acceleration options for AI and Machine Learning applications moving from cloud-based algorithms to dedicated hardware. Since the algorithms are already configured with multicore support the tradeoffs become focused on the structure of processor arrays and the optimum performance requirements at each node. In addition to the flexibility offered by the open standard ISA of RISC-V to configure the core features to match the compute requirement, RISC-V offers the options to add custom extensions and instructions that allows a greater degree of system optimization. New extensions can be targeted at the application workload or as dedicated communication channels between the cores, nodes and/or interfaces to the NoC. This paper covers a methodology to evaluate the hardware options by enabling early system architectural exploration using software to uncover the optimum design configurations.