EVALUATIONS

Started by lacylu · Feb 27, 2014 · 13 replies

l
lacylu
Feb 27, 2014 · 12y ago
Original post
During the course of source selection is it necessary to revise evaluators individual findings on their worksheets

once concensus is complete. Other CO's in the office are stating that is the process. My thought is as long as comments are dispositioned in deatail as to why a rating was changed during consensus that should suffice.

We are having some heated arguments about this topic and any feedback would be appreciated.
G
Guest Vern Edwards
Feb 27, 2014 · 12y ago
It is neither necessary nor appropriate to change evaluator worksheets to match consensus ratings. Consensus does not mean that all evaluators agree on all factor ratings, only that they can accept the consensus ratings as reasonable. However, you must document the bases for each consensus factor rating.
l
lacylu
Feb 27, 2014 · 12y ago
Thank you so much Vern,

I was hoping you would respond
l
lacylu
Feb 27, 2014 · 12y ago
By the way.. if there are any GAO cases that support your position , if you could provide that would be helpful.

I did locate one that discussed the eval findings not being captured in the final report even though they were mitigated
j
joel hoffman
Feb 28, 2014 · 12y ago
If the RFP says that the government will develop a consensus rating, why on earth are individual evaluators rating the proposals during their proposal review and initial evaluations? The consensus rating should only be assigned after the team meets as a group and discusses and documents the strengths, weaknesses,deficiencies, required clarifications, etc. found. The rating system should be based upon the underlying comments. You should not assign a rating then back into it by developing comments. There is no need for team members to assign a rating by themselves!

If the individual evaluation sheets go into the SS files you are simply providing fodder for the attorney representing a protestor. I've witnessed some disastrous depositions of board members who tried to explain how the board rated a factor much lower than they did in their initial review. One guy was so 'flustrated' that he forgot his name. When the KO tried to explain in her deposition all the differences between the individual ratings and the consensus, she couldn't do it. She had no technical expertise in the subject matter . She broke down in tears and later tore up her KO warrant and switched jobs. My lesson learned was to never again give the board members a sheet with a space to fill in the rating. A new KO was assigned along with a totally new Board, who reached the same conclusions that tihe first board did. This time the Goverment's prevailed in the second protest.
G
Guest Vern Edwards
Feb 28, 2014 · 12y ago
lacylu:

From SRA International, Inc., GAO Dec. B-407709.5, 2013 CPD ¶ 281 (Dec. 3, 2013):

We recognize that it is not unusual for individual evaluator ratings to differ from one another, or from the consensus ratings eventually assigned. Systems Research and Applications Corp.; Booz Allen Hamilton, Inc., B–299818 et al., Sept. 6, 2007, 2008 CPD ¶28 at 18. Indeed, the reconciling of such differences among evaluators' viewpoints is the ultimate purpose of a consensus evaluation. J5 Sys., Inc., B–406800, Aug. 31, 2012, 2012 CPD ¶252 at 13;Hi–Tec Sys., Inc. , B–402590, B–402590.2, June 7, 2010, 2010 CPD ¶156 at 5. Likewise, we are unaware of any requirement that every individual evaluator's scoring sheet track the final evaluation report, or that the evaluation record document the various changes in evaluators' viewpoints. J5 Sys., Inc., supra, at 13 n.15; see Smart Innovative Solutions, B–400323.3, Nov. 19, 2008, 2008 CPD ¶220 at 3. The overriding concern for our purposes is not whether an agency's final evaluation conclusions are consistent with earlier evaluation conclusions (individual or group), but whether they are reasonable and consistent with the stated evaluation criteria, and reasonably reflect the relative merits of the proposals. See, e.g., URS Fed. Tech. Servs., Inc., B–405922.2, B–405922.3, May 9, 2012, 2012 CPD ¶155 at 9 (a consensus rating need not be the same as the rating initially assigned by the individual evaluators); J5 Sys., Inc., supra, at 13; Naiad Inflatables of Newport, B–405221, Sept. 19, 2011, 2012 CPD ¶37 at 11.

From JAM Corp., GAO Dec. B-408755, 2013 CPD ¶ 282 (Dec. 4, 2013), Footnote 3:

In challenging the agency's evaluation of JAM's technical proposal, the protester largely relies upon disagreements and alleged inconsistencies in the pre-negotiation evaluation and individual evaluator's worksheets. However, individual evaluator ratings may differ, and in certain instances, differ significantly, from one another, or from the consensus ratings eventually assigned; indeed, the reconciling of such differences among evaluators' viewpoints is the ultimate purpose of a consensus evaluation. Neeser Constr., Inc./Allied Builders Sys., A Joint Venture, B–285903, Oct. 25, 2000, 2000 CPD ¶207 at 4. The overriding concern is not whether the final ratings are consistent with individual ratings, but rather, whether the agency's final consensus ratings reasonably reflect the relative merits of the proposals, consistent with the terms of the solicitation. Id. The same holds true for differences between preliminary ratings and the final consensus evaluation; allegations that the consensus evaluation report was inconsistent with individual evaluators' notes or preliminary findings, without more, are without merit. J5 Sys., Inc., B–406800, Aug. 31, 2012, 2012 CPD ¶252 at 13. Here, we have reviewed the numerous allegations based on the discrepancies between the individual evaluators' worksheets, and between the pre-negotiation evaluation and the final evaluation, and they provide no basis upon which to sustain this protest.
G
Guest Vern Edwards
Feb 28, 2014 · 12y ago
Joel, you wrote:

If the individual evaluation sheets go into the SS files you are simply providing fodder for the attorney representing a protestor. I've witnessed some disastrous depositions of board members who tried to explain how the board rated a factor much lower than they did in their initial review. One guy was so 'flustrated' that he forgot his name. When the KO tried to explain in her deposition all the differences between the individual ratings and the consensus, she couldn't do it. She had no technical expertise in the subject matter . She broke down in tears and later tore up her KO warrant and switched jobs. My lesson learned was to never again give the board members a sheet with a space to fill in the rating. A new KO was assigned along with a totally new Board, who reached the same conclusions that tihe first board did. This time the Goverment's [sic] prevailed in the second protest.

You learned the wrong lesson. Differences between individual evaluator worksheets and consensus reports provide fodder for nothing as long as the source selection decision maker does not look at them. See Watts-Weitz, JV, GAO Dec. 405475, 2011 CPD ¶ 247 (Nov. 8, 2011), Footnote 9:

During the development of the protest, Watts challenged the failure of the agency to produce individual evaluators' work sheets, which the agency could not locate. Where, as here, the record shows that the agency decision-maker relied not on the individual evaluators' worksheets but, rather, a consensus report produced from those individual evaluations, any statements made in those individual evaluations would have provided no basis on which to sustain the protest. See Vocus Inc., B–402391, Mar. 25, 2010, 2010 CPD ¶80 at 4–5. We consider the record adequate if the consensus documents and source selection decision sufficiently document the agency's rationale for the evaluation. Alliance Tech. Servs., Inc., B–311329, B–311329.2, May 30, 2008, 2008 CPD ¶108 at 3.

See also Tech Systems, Inc. v. U.S., 98 Fed. Cl. 228 (May 11, 2011), which includes an extended discussion of the relevance of individual evaluator worksheets in a source selection and in which Judge Wolski states:

But even had these worksheets been preserved and included in the administrative record, their contents would have had little bearing on this protest. This is due to the USCG's use of a consensus approach in the technical evaluation. The purpose of court review of a procurement decision is not to decide whether, in the court's estimation, the best offeror won, but rather to determine if a reasonable basis exists for the agency decision. See Weeks Marine, Inc. v. United States, 575 F.3d 1352, 1371 (Fed.Cir.2009) (explaining that it is irrelevant whether the reviewing court “might, as an original proposition, have reached a different conclusion”). In other words, the question is whether the offeror whose proposal the government reasonably thought was best won. If the agency were to misunderstand factual information in a proposal that is not open to interpretation, or make findings that are internally inconsistent, then its award decision may not reflect what its opinion would have been absent these discrepancies. Accordingly, the Court looks at the why behind the agency decision, “verifying that objective elements contained in the agency's analysis, such as the description of the offeror's narrative, correspond to the evidence in the record, ... and checking to see if subjective judgments are reached elsewhere in the analysis that contradict the evaluators' conclusions, ... making the decision too ‘implausible.’ ” USfalcon, 92 Fed.Cl. at 462 (citations omitted). When evaluations are the product of a consensus, a court could not say with confidence that the subjective judgments of any (or even all) individual evaluators truly contradict the conclusions of the team as a whole—as even if each individual evaluator independently thought an aspect of a proposal was of one quality, after conferring the entire group could change its minds. Cf. Fort Carson, 71 Fed.Cl. at 604 (recognizing an agency's “prerogative to change its mind”). Thus, the individual worksheets would be, at best, very weak evidence of the arbitrariness of the SSA's decision.

Why should the source selection authority look at individual evaluator worksheets? Once a consensus report has been prepared, the evaluator worksheets should go into a file, not to be looked at by the source selection authority. (I would not destroy them. But of course, I'm me.))

The poor CO who cried and then tore up her warrant was wise to find other work.
j
joel hoffman
Feb 28, 2014 · 12y ago
Vern, i said that the individual ratings provided "fodder for the attorney" representing the protestor. I didn't say that the "fodder" had any merit or benefit to the protestor. Lawyers who specialize in protests sometimes have a tendency to do what is necessary to cloud any issue they can. It results in much wasted Government resources and time responding to the allegations of the protestor, regardless of the merit or lack thereof with respect to the argument. It appears to me that some protest attorneys use such tactics to impress their client and to justify more billable hours.

Several years after that protest, I was involved in one in Huntsville as the chairman of an SSEB for a service contract. One unsuccessful proposer protested and asked for the individual evaluation sheets as part of the government's documentation of the evaluation and SS decision. The evaluation sheets had only consisted of comments, no rating.

Well, I had (previously) thrown them away because they were simply notes taken during the initial reviews. The Board had developed consensus comments, which I documented during the meeting then finalized afterwards. The KO had already agreed with me that they were unnecessary to keep. The lawyer amended the protest to complain about us not keeping the notes. Our lawyer was Steve Feldman, who initially agreed with the protestor. However, the KO and I convinced Steve that the RFP clearly stated that the evaluation would be done through a consensus and that individual notes were not relevant. The GAO agreed with us on that point as well as on every other point of the protest. I don't remember the case number or title as it was probably 12 years ago or more.

Subsequently, HQUSACE Office of Counsel put out a Corpswide edict to retain and file the individual evaluation sheets. We now teach in at least one class on evaluating proposals not to have individuals include ratings on their initial, individual evaluations because it is useless and is used by lwyers to needlessly muddy the waters, should there be a protest.

My point last night was intended to mean that it doesnt matter whether evaluation notes have any merit to a protest. The lawyers are going to try to cloud the issues. It diverts a great deal of time, cost and resources which are needed for other duites to respond to these useless arguments, It is a waste of individual evaluators' time to try to figure out the rating criteria on their own and to assign an individual rating based solely on their personal comments. Personal opinions and observations often change or are added to when the whole team provides additional input and hash out the final consensus. The consensus process also standardizes the perspective and the system used to rate proposals.

Since our organization is funded by project monies, all time spent on protests costs the taxpayer and the client. I have no doubt that defending the lack of notes was much cheaper than the time it would have taken to respond to any arguments that the notes didnt all match the consensus rating and evaluations. Plus Mr. Feldman learned something and helped us slam dunk the protest.

The factor rating depends upon all the underlying comments and should be the last step, not the first step in developing the consensus evaluation of a factor.
G
Guest Vern Edwards
Feb 28, 2014 · 12y ago
Joel:

Are you through editing your post now? If so, then what you wrote is one way to see the thing. An alternative way is that a rating on a worksheet is an expression of an evaluator's tentative opinion when the review of the proposal was still fresh in his or her mind and before discussing things with his colleagues. A "worksheet" is, presumably, just that, and I don't see how putting a rating on a worksheet gives a lawyer any more leverage in "clouding the issues" than leaving it off.

Franlkly, I'm not sure what you mean by "cloud the issues", unless you mean to make the issues unclear or uncertain. It seems to me that lawyers might try to create an issue, or state or restate an issue in terms favorable to their case, but not "cloud" one. Why would they want to make an issue less clear? In my experience, they usually want to state an issue in stark terms. What they might want to do in a protest is show that a selection decision was inconsistent with the terms of the solicitation or that a conclusion that one offeror was better than another was unreasonable or even irrational. To the extent that they seize upon differences among evaluators or differences between worksheet ratings and a consensus rating, they are barking up the wrong tree. The protest tribunals won't buy it unless the decision maker says that she relied on the worksheets instead of the consensus rating or relied on both, in which case she will have to show how she reconciled any inconsistencies, which would not be an unreasonable thing to ask of her.

Instead of prohibiting evaluators from putting preliminary ratings on their worksheets, I think an agency should tell evaluators to use their worksheets to (1) record preliminary findings and first impressions and (2) to refresh their memories of those findings and impressions during the discussions among themselves leading to the consensus rating. They should be told, however, that the consensus rating must ultimately be based on their consensus factual findings with respect to the proposals, which should in turn be based on their review, re-examination, and reconsideration of their preliminary findings and should make no mention of the worksheets. The source selection decision should be based entirely on the consensus findings and rating. also without reference to worksheets. Any lawyer who tries to make an issue our of differences between preliminary ratings on the evaluator worksheets and such consensus ratings would thus be barking up the wrong tree.

If the agency does it that way, and if its lawyers cannot easily make short work of any argument based on differences between worksheet ratings and a consensus rating, then it has bigger problems than worksheet ratings, because it has incompetent lawyers. And as we have discussed here many times, incompetent people often try to make their jobs easier by imposing dumb rules on other people.

If I were a CO being deposed for a protest and the protester's lawyer asked me to explain differences between ratings on individual worksheets and in a consensus report, I would say that I couldn't, because I never examined the worksheets, because the worksheets were for the use of the evaluators during consensus deliberations and the only thing I looked at was the consensus report. Then, instead of crying, I would smile.
l
lacylu
Mar 12, 2014 · 12y ago
Just wanted to send a quick note to Mr. Hoffman to clarify my initial posting. The evaluators were not rating during their inital proposal review.The ratings were completed by each evaluator upon completion of their review. At that point consensus started.

My inital question had to do with changing the inital write-ups after consensus to correlate with the consensus rating.

Sorry for any confusion.
j
joel hoffman
Mar 12, 2014 · 12y ago
Thanks, Lacylu. Sorry that I expanded the issue. At any rate, you know my opinion concerning the practice of evaluators individually assigning ratings when the plan is to develop a consensus rating after jointly documenting the basis for the rating. I'm not discouraging individuals recording comments during their initial review. That is exactly what we told the evaluators to do for the reasons that Vern described above. Just don't assign ratings at that point.

The rating will naturally "fall out" by applying the rating scheme to the group's list of strengths, weaknesses, deficiencies and other applicable comments. In my opinion, there is no need to waste time and resources to individually assign initial ratings. Some may disagree.
j
joel hoffman
Mar 17, 2014 · 12y ago

Just wanted to send a quick note to Mr. Hoffman to clarify my initial posting. The evaluators were not rating during their inital proposal review.The ratings were completed by each evaluator upon completion of their review. At that point consensus started.

My inital question had to do with changing the inital write-ups after consensus to correlate with the consensus rating.

Sorry for any confusion.

Lacylu, concerning your initial question, see B-402429, James Construction, April 21, 2010, for example, where the Protestor "...also complains that the agency's [this was an Army agency] consensus ratings did not bear a rational relationship to the TEP member's individual ratings, and the TEP member's individual rating sheets appear to have been altered to justify a lower overall consensus score." As to your question, in the example here, 2 INDIVIDUAL RATINGS were adjusted and 1 INDIVIDUAL RATING was not adjusted. Only one of three INDIVIDUAL RATINGS match the consensus rating. Two did not. There is no record or discussion of changes in the individual evaluators' underlying record of comments concerning strengths, weaknesses, etc.

See the Decision at: http://www.gao.gov/products/A89558#mt=e-report

Here, the source selection evaluation board consisted of a technical evaluation team (TET) and a price evaluation team. Each TET member individually evaluated and rated the proposals. The TET later met as a team to discuss the individual ratings and arrive at a consensus rating for each evaluation factor listed in the RFP.

The Decision includes a matrix, showing the three individual TET evaluator's preliminary RATINGS and the consensus rating for each factor. Two of the three individuals adjusted their ratings during the consensus rating process. One of those two then matched the consensus rating and the other did not fully match the consensus rating. The third person did not adjust his/her ratings and two of those ratings don't match the consensus. The Decision appears to reflect that the consensus ratings were based upon an assessment of the proposals' strengths and weaknesses and I presume, proposal risks.

The Decision states:

"...As an initial matter, we find no support in the record for James' claims that the individual rating sheets may have been altered in order for James to have a lower overall consensus score. In this regard, the agency provided detailed statements from the TEP members and chairman, which are consistent with the contemporaneous documentation, explaining the reasons for the changes in the ratings, and how and why the TEP determined the consensus ratings.

With respect to the allegation that there is a discrepancy between the initial and consensus ratings, it is not unusual for individual evaluator ratings to differ significantly from one another, or from the consensus ratings eventually assigned. In this regard, such ratings properly may be determined after discussions among the evaluators, which is what occurred and was adequately documented here. Joint Mgmt. & Tech. Servs., B'294229, B-294229.2, Sept. 22, 2004, 2004 CPD para. 208 at 4; I.S. Grupe, Inc., B'278839, Mar. 20, 1998, 98'1 CPD para. 86 at 5-6. The overriding concern for our purpose is not whether the final ratings are consistent with earlier, individual ratings but, again, whether they reasonably reflect the relative merits of the proposals. Brisk Waterproofing Co., Inc., supra, at 2 n.1. Based on our review, we see nothing unreasonable about the changes between the initial ratings assigned, and the final consensus evaluation ratings.

With regard to past performance, the agency explained that, even though the initial individual ratings displayed two good ratings and an acceptable rating, the consensus of the panel was that James' proposal should be rated acceptable after considering the relevance of the projects reflected in James' performance questionnaires and the CCASS evaluations. For the reasons stated above, we conclude that the acceptable past performance rating reasonably reflects the relative merits of James' past performance.

Similarly, with regard to the quality of building systems and materials evaluation factor, the individual evaluators initially scored James' proposal with two acceptable ratings and one acceptable/marginal rating, but the consensus rating was marginal. Again, during the consensus discussions the evaluators noted that James' proposal did not satisfy four of the nine criteria in the RFP. Thus, the consensus of the panel was that a marginal rating reasonably reflected the merits of James' proposal. As discussed above, we see nothing unreasonable about this rating."

In my personal opinion, by using individual ratings, then having to justify/defend how the ratings were adjusted during the consensus rating process, it appears that the government entity expended considerable resources, time and money that would not have been necessary had they not individually rated the proposals.during their initial reviews. Because (I know that) the organization involved is project funded on a reimbursable basis for such activities, those were real expended costs and real resources.

The DoD and the Army have standardized criteria for establishing ratings and proposal risks. The factor ratings are, by design, dependant upon the underlying documentation of the proposal strengths, weaknesses, deficiencies, risks, etc. that are developed during the consensus process. Thus, there is no need to bother developing individual ratings. They should list their comments for use during the consensus process.
j
joel hoffman
Mar 17, 2014 · 12y ago
Lacylu, see also today's WIFCON Protest summary of Comptroller General Decisions: "Evaluators' handwritten evaluation sheets, destroyed, summary document. See Custom Pak, Inc.; M-Pak, Inc., B-409308, B-409308.2, B-409308.3, B-409308.4: Mar 4, 2014. (March 17, 2014)"

Here is the WIFCON link to the Decision: /legacy/a/dd8fdc5e43c024e1.pdf

I am not suggesting here that the government should destroy the individual evaluators' notes. This just happens to be a recent Decision that is related to the discussion in this thread for the archives..
j
joel hoffman
Apr 15, 2014 · 12y ago
Here is another instance on today's WIFCON Home Page under the Protest column of a source selection where evaluators individually rated (scored) proposals, then the consensus rating process (duh) resulted in differences. Once again, a disappointed offeror unsuccessfully argued that the source selection decision was flawed because of the differences between individual evaluator's ratings and the consensus ratings. There should be no need to individually score proposals before the consensus rating process. Instead, the evaluators should focus on identifying strengths, weaknesses, deficiencies, concerns, uncertainties, etc.

From the WIFCON Legal pages at

FAR 15.305 (a)(3): Proposal Evaluation - Evaluator's scoring

" New MSI next argues that the contracting officer’s best value decision was flawed because “the final TEC consensus scores upon which he relied are inconsistent with the final scores assigned by the individual TEC members.” Comments at 19. MSI also asserts that, because the record contains no explanation of why the scores changed between the individual evaluation worksheets and the final consensus report, the protest must be sustained for failure to adequately document the source selection decision. Id. Our review of the record affords us no basis to question the agency’s evaluation.

It is not unusual for individual evaluator ratings to differ significantly from one another, or from the consensus ratings eventually assigned; indeed, the reconciling of such differences among evaluators’ viewpoints is the ultimate purpose of a consensus evaluation. J5 Systems, Inc., B- 406800, Aug. 31, 2012, 2012 CPD ¶ 252 at 10. Our overriding concern is not whether an agency’s final ratings are consistent with preliminary ratings, but whether they reasonably reflect the relative merits of the proposals, consistent with the solicitation. Id. Further, our Office has consistently held that numerical point scores are useful only as guides for intelligent decision-making and are not generally controlling for award because they often reflect the disparate, subjective judgments of the evaluators. National Medical Seminars Tempharmacists, B-233452, Feb. 22, 1989, 89-1 CPD ¶ 191 at 2.

We are unaware of any requirement that every individual evaluator’s scoring sheet track the final evaluation report, or that the evaluation record document the various changes in evaluators’ viewpoints. J5 Systems, Inc., supra at 13 n.15. More importantly, our review of the record does not lead us to conclude that the agency’s evaluation was objectionable. The alleged inconsistencies upon which the protester asks us to sustain this protest amount to nothing more than quibbling with the minutia of the agency’s scoring of proposals. The probative question here, however, is not whether the agency’s point scores were off by one or two points, but whether the agency properly justified paying a price premium for IRG’s technically superior solution. We next turn to MSI’s contention in this regard. (Management Systems International, Inc., B-409415, B-409415.2: Apr 2, 2014) (pdf)"

Regardless of prevailing in the defense of the protestor's argument, the point is that individually scoring proposals resulted in wasted time and resources spent first in assigning individual ratings based upon one set of eyes and one set of comments. Then more time and resources were wasted in developing a response to allegations that had no merit because the only relevant TEC ratings were based upon a consensus process after input from several sets of eyes and after documenting the group's final collection of underlying comments. The group would also have generally used a more consistent approach to applying the RFP's rating scheme than individuals did.

Of course, the source selection authority may override that group's evaluation and should thoroughly explain his/her rationale for doing so.

This protest was also interesting in reading how government use of numerical rating systems often appear to emphasize precise differences between proposals. The GAO noted above "Further, our Office has consistently held that numerical point scores are useful only as guides for intelligent decision-making and are not generally controlling for award because they often reflect the disparate, subjective judgments of the evaluators." In addition, the protestor "...also challenge[d] the agency’s reliance on technical point scores arguing that any perceived difference between the proposals was the 'result of false precision.' Comments at 30 - 36." The protestor argued that the difference in scores should have been "4.62 points, instead of the assigned differential of 5.33 points." SHEESH.