A lot of people are just creating a single discrimination/difficulty index for each test group but even if your math is right when doing that you are creating essentially a garbage value. These indexes are to be created for EACH ITEM not each group. Remember also that these indexes are ONLY appropriate with educational tests that have a single correct response (like a multiple choice exam).
Let’s make an example item where I’ll go through the calculation process and then we’ll talk about how to structure your results.