Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast fail when Rscript isn't installed #1290

Closed
bartgrantham opened this issue Mar 1, 2019 · 6 comments
Closed

Fast fail when Rscript isn't installed #1290

bartgrantham opened this issue Mar 1, 2019 · 6 comments
Labels

Comments

@bartgrantham
Copy link

bartgrantham commented Mar 1, 2019

Feature request

Tool(s) involved

Subcommands: CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectGcBiasMetrics, CollectRnaSeqMetrics, CollectRrbsMetrics, CollectMultipleMetrics, CollectWgsMetricsWithNonZeroCoverage (and probably more)

Description

These tools should fail immediately if R/Rscript is not installed with the required dependencies. It's very frustrating to have a tool like CollectMultipleMetrics crank away at a file for hours only to have it fail at the end because R isn't installed and therefore it can't generate plots.

(I shudder to think of all the time and electricity has been burned by people using picard for these purposes only to have it crash at the end when it could have warned them at the start)

@tfenne
Copy link
Collaborator

tfenne commented Mar 1, 2019

I would suggest a different solution. I think Picard tools should detect whether R is present or not (and has the requires packages installed or not) and then: a) log a warning if those dependencies aren't met and b) only attempt plot generation if they are not.

@bartgrantham
Copy link
Author

I disagree, that's even worse. At least as it is now if someone was expecting a graphical result they know that it didn't work at runtime. If the user is expecting a plot to be produced, this kind of silent failure wouldn't be known until potentially much later, after they've run a huge batch. If picard had the behavior you suggest I would have burned at least a few hundred dollars in cloud computing just to find out that it didn't actually do what I expected.

Please take into account that many people use these tools in larger workflows in batch processing and simply cannot watch every single line of output. Even for users who are running these tools manually and serially picard is so chatty it would be very easy to miss that warning among all the other output (gatk has the same problem).

Perhaps the creation of plots should be a command line switch (or maybe not generating plots is the switch?), and the error happens if plots are expected and R isn't available.

@DarioS
Copy link

DarioS commented Mar 15, 2019

This happened to me today. It is disappointing that the prerequisite isn't checked at the beginning of the analysis. The documentation of CollectInsertSizeMetrics also doesn't mention it at all.

INFO    2019-03-15 16:08:54     SinglePassSamProgram    Processed   863,000,000 records.  Elapsed time: 00:33:09s.  Time for last 1,000,000:    2s.  Last read position: chrUn_GL000216v2:160,617
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory

If I knew to use module load R before running the analysis, it would have been easy to get it right the first time. It's more an issue of vague and incomplete documentation.

@kbergin
Copy link
Contributor

kbergin commented Nov 30, 2020

@sophiacrennan has made PR #1613 to respond to this issue. If you're still interested in this, please take a look and review!

@kbergin
Copy link
Contributor

kbergin commented Dec 3, 2020

This has been merged in. The tools will now throw an error at the start if the user requests the chart but doesn't have R installed.

@kbergin kbergin closed this as completed Dec 3, 2020
@paidi
Copy link

paidi commented Apr 6, 2022

I just logged a new issue related to this discussion: #1793. The solution implemented in #1613 causes problems for some use cases (when someone wants the metrics but not the plot). A better solution (IMO) would be the other approach discussed above, where it would possible to request not generating plots if you wanted to run on a machine without an R installation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants