Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR-AIR] Apply outer perm for unpack for air in 4-level tiling #1145

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erwei-xilinx
Copy link
Contributor

  • This is needed to get numerically correct results for mlir-air pipeline when using pack-peel-4-level-tiling scheme.

@yzhang93
Copy link
Contributor

I've also seen the flaky failure on linux ci runner compiling third_party/mlir-air today. Could you please double check to see what caused the problem?

@erwei-xilinx
Copy link
Contributor Author

I've also seen the flaky failure on linux ci runner compiling third_party/mlir-air today. Could you please double check to see what caused the problem?

clang: error: clang frontend command failed with exit code 135 (use -v to see invocation) seems to be the issue causing the failures. 135 is segfault. But it seems that the segfault is happening at varying places for different CI runs... The same issue hasn't occurred on the CI machine of mlir-air before.

@yzhang93
Copy link
Contributor

I've also seen the flaky failure on linux ci runner compiling third_party/mlir-air today. Could you please double check to see what caused the problem?

clang: error: clang frontend command failed with exit code 135 (use -v to see invocation) seems to be the issue causing the failures. 135 is segfault. But it seems that the segfault is happening at varying places for different CI runs... The same issue hasn't occurred on the CI machine of mlir-air before.

A rerun helps, but it did happen very frequently today on several PRs...

@erwei-xilinx
Copy link
Contributor Author

A rerun helps, but it did happen very frequently today on several PRs...

I can try to go to mlir-air repo and see if I can remove some redundant code, to reduce the codebase size.

@yzhang93
Copy link
Contributor

A rerun helps, but it did happen very frequently today on several PRs...

I can try to go to mlir-air repo and see if I can remove some redundant code, to reduce the codebase size.

Thanks. You think it's OOM? Did you add a lot of codes compared to your previous bump?

auto maybePackPeelTiling =
ParameterSetting::create(linalgOp, /*isObjectFifo=*/true, targetDevice,
numRows, numCols, /*kPackScaleL1=*/2);
ParameterSetting::create(linalgOp, isObjectFifo, targetDevice, numRows,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is enough to get correct numerics? You have to modify the outer perm similar to this

inside setRootConfigForPackPeel4LevelTilingPipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like you can add a test in ci to make sure it's running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, that would be great! Will add a matmul test to make sure.

@erwei-xilinx
Copy link
Contributor Author

Thanks. You think it's OOM? Did you add a lot of codes compared to your previous bump?

Yeah I suspect it is OOM, as I notice that the segfault happens when more than one CI gets triggered at the same time. I didn't add a lot of codes recently, but the code base has been pretty big. AIRDependencyScheduleOpt.cpp for example is >5000 lines...

@newling
Copy link
Contributor

newling commented Feb 27, 2025

clang: error: clang frontend command failed with exit code 135 (use -v to see invocation)

FWIW I saw this last week once, I'd never seen it before (ever ever).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants