Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Model Outputs on Android vs. iOS with tfjs #8342

Open
ertan95 opened this issue Jul 24, 2024 · 8 comments
Open

Different Model Outputs on Android vs. iOS with tfjs #8342

ertan95 opened this issue Jul 24, 2024 · 8 comments
Assignees
Labels
comp:react-native type:bug Something isn't working

Comments

@ertan95
Copy link

ertan95 commented Jul 24, 2024

I am experiencing a significant discrepancy in the logits and probabilities output when running the same TensorFlow.js model on Android and iOS. The model, a self-trained MobileNetV3 (large) from PyTorch, performs as expected on iOS and in a PyTorch Jupyter Notebook but produces different results on Android. I convert the model from PyTorch to ONNX and then to TensorFlow.js.

To troubleshoot, I saved the preprocessed tensor from Android and used it on iOS, where it worked correctly. Conversely, the iOS tensor failed on Android. This rules out preprocessing issues, suggesting either improper weight handling on Android or an issue with the predict function.

System information

  • I am using an iPhone 15 Pro and for Android Samsung S20 (SM-G980F)

Snippet from my package.json:

"@tensorflow/tfjs": "^4.15.0",
"@tensorflow/tfjs-core": "^4.15.0",
"@tensorflow/tfjs-react-native": "^1.0.0",

Describe the current behavior
The model outputs consistent and expected results on iOS and in the Jupyter Notebook but produces different and incorrect results on Android.

Describe the expected behavior
The model should produce consistent logits and probabilities across all platforms, including Android, as it does on iOS and in the PyTorch Jupyter Notebook.

Standalone code to reproduce the issue

const savePredictions = async (logits, probabilities, fileName, variants, processedImage) => {
  try {
    const logitsData = await logits.array();
    const probabilitiesData = await probabilities.array();
    const processedImageData = await processedImage.array();

    const predictionsJSON = {
      variants: variants,
      processedImage: processedImageData,
      logits: logitsData,
      probabilities: probabilitiesData,
    };
    const tensorJSON = JSON.stringify(predictionsJSON);
    await FileSystem.writeAsStringAsync(fileName, tensorJSON);
    console.log('Predictions saved:', fileName, 'in', FileSystem.documentDirectory);
  } catch (error) {
    console.error('Error:', error);
  }
};

const processImage = async (uri: string): Promise<tf.Tensor> => {
  try {
    // rescale picture to model trained picture size
    const resizedImg = await manipulateAsync(
      uri,
      [{ resize: { width: trainingSizes.img_width, height: trainingSizes.img_height } }],
      { compress: 0.6, format: SaveFormat.JPEG, base64: true }
    );

    const imageTensor = tf.tidy(() => {
      const rescaledBase64 = `data:image/jpeg;base64,${resizedImg.base64}`.split(',')[1];
      const uint8array = tf.util.encodeString(rescaledBase64, 'base64').buffer;
      let tensor = decodeJpeg(new Uint8Array(uint8array));
      tensor = tf.image.resizeBilinear(tensor, [
        trainingSizesEN.img_height,
        trainingSizesEN.img_width,
      ]);
      tensor = tensor.div(255.0);
      tensor = tensor
        .sub(tf.tensor1d([0.485, 0.456, 0.406]))
        .div(tf.tensor1d([0.229, 0.224, 0.225]));
      tensor = tensor.transpose([2, 0, 1]).expandDims(0);
      return tensor;
    });

    //console.log('processImage memory:', tf.memory());
    return imageTensor;
  } catch (error) {
    console.error('Error on preprocessing image:', error);
    throw error;
  }
};


const predictImage = async (
  model: tf.GraphModel | null,
  processedImage: tf.Tensor,
  variants: string[],
): Promise<string[]> => {
  try {
    if (!model) {
      throw new Error('Modell not loaded');
    }
    //Overwrite processedImage with test data
    /*
    const testTensorData: number[][][][] = predictionJSON_Android[
      'processedImage'
    ] as number[][][][];
    const testTensor = tf.tensor4d(testTensorData);
    processedImage = testTensor;
    */

    // Mask non relevant classes
    const maskArray = Object.values(classLabels).map((label) =>
      variants.includes(label) ? 1 : 0
    );
    const maskTensor = tf.tensor(maskArray, [1, maskArray.length]);
    const modelInput = { input: processedImage, mask: maskTensor };
    const tidyResult = tf.tidy(() => {
      const logits = model.predict(modelInput) as tf.Tensor;
      const probabilities = tf.softmax(logits);
      return { logits, probabilities };
    });

    await savePredictions(
      tidyResult.logits,
      tidyResult.probabilities,
      FileSystem.documentDirectory + 'prediction.json',
      variants,
      processedImage
    );

    tidyResult.logits.dispose();
    maskTensor.dispose();
    tf.dispose(processedImage);

    const predictionArrayBuffer = await tidyResult.probabilities.data();
    tidyResult.probabilities.dispose();

    const predictionArray = Array.from(predictionArrayBuffer);
    const classLabelsArray = Object.values(classLabels);

    const variantPredictions = predictionArray
      .map((probability, index) => ({ label: classLabelsArray[index], probability }))
      .filter((prediction) => cardVariants.includes(prediction.label))
      .sort((a, b) => b.probability - a.probability);

    variantPredictions.forEach((variant) => {
      console.log(`Probillity for ${variant.label}: ${variant.probability}`);
    });

    const sortedLabels = variantPredictions.map((prediction) => prediction.label);
    return sortedLabels;
  } catch (error) {
    console.error('Error on prediction:', error);
    throw error;
  }
};

....
//Loading model
const loadModel = async () => {
  try {
    const ioHandler = bundleResourceIO(modelJson as tf.io.ModelJSON, [
      modelWeights1,
      modelWeights2,
      modelWeights3,
      modelWeights4,
    ]);
    const model= await tf.loadGraphModel(ioHandler);
    return model;
  } catch (error) {
    console.error('Error on loading model:', error);
    return null;
  }
};

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
prediction_ios.json
prediction_ios_with_android_tensor.json
prediction_android.json
prediction_android_with_ios_tensor.json

@ertan95 ertan95 added the type:bug Something isn't working label Jul 24, 2024
@ertan95 ertan95 changed the title Discrepancy in Logits/Probabilities on Android vs. iOS using TensorFlow.js Different Model Outputs on Android vs. iOS with tfjs Jul 24, 2024
@gaikwadrahul8 gaikwadrahul8 self-assigned this Jul 24, 2024
@oleksandr-ravin
Copy link

I have same problem on all devices based on Samsung Exynos chipset platforms and i reproduce same problem on POCO c40, on snapdragon or ios device it's work as expected. Project run on angular with converted models on tfjs 4.17.0...

try { this.model = await tfconv.loadGraphModel(path, { requestInit, onProgress: (value) => { console.log('[MODEL] loading: ' + getModelNameFromPath(path) + ': ' + (value * 100) + '%'); this.updateLoadingProgress(value); } }); } catch (e) { console.error('[MODEL] error loading model', e); }

const res = await modelLoad.executeAsync({[config.inputsName]: inputTensor}, config.outputs);

@oleksandr-ravin
Copy link

oleksandr-ravin commented Jul 26, 2024

I downgrade libs to 3.3.0 version and it works! on version 3.11.0 still problem. Version between 3.3.0 and 3.11.0 i didn't check

@gaikwadrahul8
Copy link
Contributor

Hi, @ertan95, @oleksandr-ravin

I apologize for the delayed response and thank you for bringing this issue to our attention with valuable analysis and insights, if possible could you please help us with your Github repo along with comprehensive steps to reproduce the same behavior from our end to investigate this behavior further ?

Thank you for your cooperation and patience.

@oleksandr-ravin
Copy link

Hi @gaikwadrahul8 for my models "outputs": [
"StatefulPartitionedCall/model_1/zoomin_type/Softmax",
"StatefulPartitionedCall/model_1/sectors_quality/Sigmoid",
"StatefulPartitionedCall/model_1/body_type/Softmax",
"StatefulPartitionedCall/model_1/out_of_distribution/Sigmoid",
"StatefulPartitionedCall/model_1/spheric_sectors_onehot_encoded/Softmax"
],

i finished test all version and 3.3.0 it's latest version witch give me correct values, need find wath change frome 3.3.0 to 3.4.0. if i understand correct it's problem on hardware translations and calculations other phone didn't have this truble.

@ertan95
Copy link
Author

ertan95 commented Aug 25, 2024

Hi @gaikwadrahul8 for my models "outputs": [ "StatefulPartitionedCall/model_1/zoomin_type/Softmax", "StatefulPartitionedCall/model_1/sectors_quality/Sigmoid", "StatefulPartitionedCall/model_1/body_type/Softmax", "StatefulPartitionedCall/model_1/out_of_distribution/Sigmoid", "StatefulPartitionedCall/model_1/spheric_sectors_onehot_encoded/Softmax" ],

i finished test all version and 3.3.0 it's latest version witch give me correct values, need find wath change frome 3.3.0 to 3.4.0. if i understand correct it's problem on hardware translations and calculations other phone didn't have this truble.

I will have a look at this one. A downgrade is not the best option for me due to other dependencies, but I will give it a try thanks for the solution!

@ertan95
Copy link
Author

ertan95 commented Aug 25, 2024

Hi, @ertan95, @oleksandr-ravin

I apologize for the delayed response and thank you for bringing this issue to our attention with valuable analysis and insights, if possible could you please help us with your Github repo along with comprehensive steps to reproduce the same behavior from our end to investigate this behavior further ?

Thank you for your cooperation and patience.

Well I've described the steps to reproduce in my initial post. Basically train a MobileNetV3 and test it on ios and android with the same image you will get different outputs.

@keunhyunkim
Copy link

keunhyunkim commented Sep 26, 2024

Hi, @ertan95, @oleksandr-ravin, @gaikwadrahul8 I also have same problem.
and here're some test result I did. I hope this test result would help for fixing this problem

  1. tfjs version test
    in tfjs 3.3.0 version, with webgl backend, get right result
    in tfjs 3.4.0 version, with webgl backend, get wrong result
    -> something changes from 3.3.0 to 3.4.0 seem to affect this problem

  2. backend test
    in tfjs 4.20 version, with webgl backend, get wrong result
    in tfjs 4.20 version, with cpu backend, get right result

  3. CPU chipset test
    in some mobile phone with Exynos, not all, i found this problem.
    i couldn't find this problem with Snap dragon so far

considering these test result, I guess there're some incompatible points between webgl backend and some Exynos version,
and this is because of some changes applied when updating from tfjs 3.3.0 to 3.4.0.

@shiomax
Copy link

shiomax commented Oct 3, 2024

Had similar issues on some Samsung devices. Exynos or not did not seem to be the deciding factor. Some Exynos processors work. Some don´t. A55 does not work A54 does work. Both are using Exynos branded chips. It depends on the exact chip. Changing the version of tfjs did not help at all.

Ended up converting the model to onnx and using onnx runtime web instead of tfjs and that works out great on all devices so far. I would hope that even if that one fails on some devices too (did not happen yet) that then tfjs would work. So I can have one of the two runtimes work to cover all devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:react-native type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants