R-Car/Merging-MMC-block-requests

From eLinux.org
Jump to: navigation, search

Overview

Linux kernel v5.4 supports mering MMC block requests by using IOMMU feature. If we enables the feature on R-Car Gen3, MMC read/write performance can be improved.

Without this feature

  • Linux block layer cannot merge any bio.
  • Since R-Car Gen3 SDHI's max_segs is 1, Linux block layer only prepare one sg.

With this feature

  • Linux block layer can merge bio if filesystem data is aligned with IOMMU (e.g. 4KB page).
  • Even if R-Car Gen3 SDHI's max_segs is 1, this feature can map continuous pages if the data is aligned.
    • So, since overhead which issuing read/write commands can be reduced than without this feature, MMC read/write performance can be improved.

Remarks

  • Even if we enable this feature, Linux block layer cannot merge bio if filesystem data is not aligned (e.g. ext4 with small partition and -J size=1).

Related commits

158a6d3 iommu/dma: add a new dma_map_ops of get_merge_boundary()
6ba9941 dma-mapping: introduce dma_get_merge_boundary()
38c38cb mmc: queue: use bigger segments if DMA MAP layer can merge the segments
45147fb block: add a helper function to merge the segments

How to enable the feature on R-Car Gen3?

Kernel configuration

We need to enable the following kernel configuration.

CONFIG_MMC=y
CONFIG_MMC_BLOCK=y
CONFIG_MMC_SDHI=y
CONFIG_MMC_SDHI_INTERNAL_DMAC=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_IPMMU_VMSA=y

Modify a driver

We need to modify drivers/iommu/ipmmu-vmsa.c like below.

v5.8 or earlier

static const char * const rcar_gen3_slave_whitelist[] = {
	"ee100000.sd",
	"ee120000.sd",
	"ee140000.sd",
	"ee160000.sd",
};

v5.9 or later

static const char * const rcar_gen3_slave_whitelist[] = {
	"ee100000.mmc",
	"ee120000.mmc",
	"ee140000.mmc",
	"ee160000.mmc",
};

How to confirm?

After the kernel booted, we can confirm whether the feature is enabled via sysfs.

# ls /sys/kernel/iommu_groups/0/devices/
ee100000.mmc  ee140000.mmc  ee160000.mmc

Performance measurement

The following table is a performance measurement on v5.1-rc6[1].

environment Sequential Output (KB/sec) Sequential Input (KB/sec)
H3 without this feature 117,133 118,682
H3 with this feature 130,482 195,727

Reference