You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Number of options: 16
Number of samples: 500
Languages
es 281
en 207
pt 12
Options
Medidas Provisionales 255
Supervisión de cumplimiento de Sentencia 187
Monitoring compliance with Judgment 24
Precautionary Measures 19
Otros 9
Fondo de asistencia a víctimas 6
Interpretación 2
Medidas Provisorias 2
Outros 1
Victims' Legal Assistance Fund 1
Labels are coming in mixed languages. This is probably because the labels are being taken from the denormalized metadata, and that denormalization is not accounting for translations.
2024-09-04 15:42:47,019 [INFO]
Number of options: 4
Number of samples: 500
Languages
es 281
en 207
pt 12
Options
Corte Interamericana de Derechos Humanos 500 for {"run_name":"cejil","extraction_name":"66d87fa4d5516a82a6f0e1e7","metadata":{}}
It says "Number of options 4", but it is only reporting 500 items all from the same option. Something is off here cc @gabriel-piles
Ensure that training data is being sent from the denormalized data.
If both conditions are met, then this would be a non-issue, and can be closed.
If we are NOT sending the correctly translated / denormalized data, then we need to take steps into ensuring that.
Maybe the right approach is not relying on denormalized data for critical db-integrity processes?
RafaPolit
changed the title
[IX] - Uwazi is sending chaotic data for training
[IX] - Uwazi is sending chaotic data for training (ensure training is using denormalized data)
Sep 6, 2024
Eg. in this case for selects:
Labels are coming in mixed languages. This is probably because the labels are being taken from the denormalized metadata, and that denormalization is not accounting for translations.
(related) https://github.com/huridocs/ml-backlog/issues/31 #7184
The text was updated successfully, but these errors were encountered: